MarkdownHeaderTextSplitterError
langchain.text_splitter.markdown_header_text_splitter.MarkdownHeaderTextSplitterError
Stack trace
langchain.text_splitter.markdown_header_text_splitter.MarkdownHeaderTextSplitterError: Failed to parse markdown headers: unexpected header format at line 12
File "/usr/local/lib/python3.9/site-packages/langchain/text_splitter/markdown_header_text_splitter.py", line 78, in split_text
raise MarkdownHeaderTextSplitterError(f"Failed to parse markdown headers: {err}") Why it happens
MarkdownHeaderTextSplitter expects markdown text to have well-formed headers (e.g., #, ##, ###) to split content correctly. If the markdown contains malformed headers, inconsistent spacing, or unsupported header styles, the splitter raises this error because it cannot reliably identify chunk boundaries.
Detection
Monitor logs for MarkdownHeaderTextSplitterError exceptions and validate input markdown format before splitting to catch malformed headers early.
Causes & fixes
Markdown text contains headers with inconsistent or missing leading hashes (#) or spacing
Ensure all headers use standard markdown syntax with correct number of hashes and a space after them, e.g., '# Header 1', '## Header 2'
Markdown includes non-header lines that resemble headers but do not conform to markdown header syntax
Clean or preprocess markdown input to remove or fix lines that look like headers but are malformed before passing to the splitter
Using a markdown flavor or extension that the splitter does not support (e.g., Setext-style headers)
Convert markdown to ATX-style headers (#) or use a different splitter that supports your markdown flavor
Code: broken vs fixed
from langchain.text_splitter import MarkdownHeaderTextSplitter
text = """#Header1
Some content
##Header2
More content"""
splitter = MarkdownHeaderTextSplitter()
chunks = splitter.split_text(text) # This line raises MarkdownHeaderTextSplitterError import os
from langchain.text_splitter import MarkdownHeaderTextSplitter
text = """# Header1
Some content
## Header2
More content""" # Fixed headers with space after #
splitter = MarkdownHeaderTextSplitter()
chunks = splitter.split_text(text) # Works without error
print(chunks) Workaround
Wrap the split_text call in try/except MarkdownHeaderTextSplitterError, then preprocess the markdown to fix header formatting or fallback to a simpler splitter like CharacterTextSplitter.
Prevention
Enforce markdown input validation and normalization in your ingestion pipeline to guarantee well-formed ATX headers before chunking with MarkdownHeaderTextSplitter.