ValueError
builtins.ValueError
Stack trace
Traceback (most recent call last):
File "app.py", line 42, in <module>
chunks = text_splitter.split_text_with_metadata(text)
File "/usr/local/lib/python3.9/site-packages/langchain/text_splitter.py", line 88, in split_text_with_metadata
raise ValueError("Chunk metadata lost during split")
ValueError: Chunk metadata lost during split Why it happens
When splitting text into chunks, the metadata associated with the original text (such as source, index, or custom tags) must be explicitly preserved and attached to each chunk. If the splitting method returns only raw text chunks without reattaching or propagating metadata, this error is raised to prevent silent data loss.
Detection
Monitor your chunking pipeline to ensure that after splitting, each chunk retains its metadata fields; add assertions or logging to verify metadata presence before downstream processing.
Causes & fixes
Using a text splitter method that returns only plain text chunks without metadata objects
Switch to a splitter method or class that returns chunk objects including metadata, such as split_documents() instead of split_text()
Manually splitting text but forgetting to copy or assign metadata to each chunk
Explicitly copy the metadata dictionary from the original document to each chunk after splitting
Custom splitter implementation that does not handle metadata propagation
Modify the splitter code to accept and return metadata along with text chunks, preserving all relevant fields
Code: broken vs fixed
from langchain.text_splitter import RecursiveCharacterTextSplitter
text = "Long document text here"
metadata = {"source": "doc1"}
text_splitter = RecursiveCharacterTextSplitter(chunk_size=100, chunk_overlap=20)
chunks = text_splitter.split_text(text) # This returns plain text chunks, losing metadata
# Raises ValueError: Chunk metadata lost during split import os
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.schema import Document
os.environ["LANGCHAIN_API_KEY"] = os.environ.get("LANGCHAIN_API_KEY", "") # Use env var for keys
doc = Document(page_content="Long document text here", metadata={"source": "doc1"})
text_splitter = RecursiveCharacterTextSplitter(chunk_size=100, chunk_overlap=20)
chunks = text_splitter.split_documents([doc]) # Preserves metadata in each chunk
print(chunks) # Works without error, metadata intact Workaround
Wrap the splitting call in try/except ValueError; if metadata is lost, manually reattach metadata by iterating over chunks and assigning the original metadata dictionary.
Prevention
Always use chunking methods that explicitly support metadata propagation, such as split_documents(), and design custom splitters to handle metadata alongside text.