High severity intermediate · Fix: 5-10 min

ValueError

builtins.ValueError

What this error means
This error occurs when chunk metadata is not preserved during text splitting, causing loss of important context or identifiers.

Stack trace

traceback
Traceback (most recent call last):
  File "app.py", line 42, in <module>
    chunks = text_splitter.split_text_with_metadata(text)
  File "/usr/local/lib/python3.9/site-packages/langchain/text_splitter.py", line 88, in split_text_with_metadata
    raise ValueError("Chunk metadata lost during split")
ValueError: Chunk metadata lost during split
QUICK FIX
Use split_documents() instead of split_text() to preserve metadata automatically during chunking.

Why it happens

When splitting text into chunks, the metadata associated with the original text (such as source, index, or custom tags) must be explicitly preserved and attached to each chunk. If the splitting method returns only raw text chunks without reattaching or propagating metadata, this error is raised to prevent silent data loss.

Detection

Monitor your chunking pipeline to ensure that after splitting, each chunk retains its metadata fields; add assertions or logging to verify metadata presence before downstream processing.

Causes & fixes

1

Using a text splitter method that returns only plain text chunks without metadata objects

✓ Fix

Switch to a splitter method or class that returns chunk objects including metadata, such as split_documents() instead of split_text()

2

Manually splitting text but forgetting to copy or assign metadata to each chunk

✓ Fix

Explicitly copy the metadata dictionary from the original document to each chunk after splitting

3

Custom splitter implementation that does not handle metadata propagation

✓ Fix

Modify the splitter code to accept and return metadata along with text chunks, preserving all relevant fields

Code: broken vs fixed

Broken - triggers the error
python
from langchain.text_splitter import RecursiveCharacterTextSplitter

text = "Long document text here"
metadata = {"source": "doc1"}
text_splitter = RecursiveCharacterTextSplitter(chunk_size=100, chunk_overlap=20)
chunks = text_splitter.split_text(text)  # This returns plain text chunks, losing metadata
# Raises ValueError: Chunk metadata lost during split
Fixed - works correctly
python
import os
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.schema import Document

os.environ["LANGCHAIN_API_KEY"] = os.environ.get("LANGCHAIN_API_KEY", "")  # Use env var for keys

doc = Document(page_content="Long document text here", metadata={"source": "doc1"})
text_splitter = RecursiveCharacterTextSplitter(chunk_size=100, chunk_overlap=20)
chunks = text_splitter.split_documents([doc])  # Preserves metadata in each chunk
print(chunks)  # Works without error, metadata intact
Changed from split_text() which returns plain strings to split_documents() which returns Document objects preserving metadata.

Workaround

Wrap the splitting call in try/except ValueError; if metadata is lost, manually reattach metadata by iterating over chunks and assigning the original metadata dictionary.

Prevention

Always use chunking methods that explicitly support metadata propagation, such as split_documents(), and design custom splitters to handle metadata alongside text.

Python 3.9+ · langchain-core >=0.1.0 · tested on 0.2.x
Verified 2026-04
Verify ↗

Community Notes

No notes yetBe the first to share a version-specific fix or tip.