ValueError
langchain.text_splitter.RecursiveCharacterTextSplitter.ValueError
Stack trace
ValueError: chunk_size must be larger than chunk_overlap
File "/path/to/langchain/text_splitter.py", line 123, in split_text
raise ValueError("chunk_size must be larger than chunk_overlap")
File "/path/to/your_script.py", line 45, in <module>
chunks = text_splitter.split_text(long_text) # triggers error Why it happens
The RecursiveCharacterTextSplitter requires chunk_size to be strictly larger than chunk_overlap to avoid infinite recursion. If chunk_size is smaller or equal to chunk_overlap, the splitter cannot progress, causing a ValueError. This often happens when parameters are misconfigured or default values are overridden incorrectly.
Detection
Check parameter values before splitting: assert chunk_size > chunk_overlap. Log or validate splitter config at initialization to catch invalid settings early.
Causes & fixes
chunk_size is less than or equal to chunk_overlap, causing infinite recursion
Set chunk_size to a value strictly greater than chunk_overlap, e.g., chunk_size=1000 and chunk_overlap=200
Using default chunk_size and chunk_overlap values that conflict with the input text length or splitter logic
Explicitly specify chunk_size and chunk_overlap parameters appropriate for your text length and use case
Passing empty or very short text that cannot be split with given parameters
Validate input text length before splitting and adjust chunk_size or skip splitting for short texts
Code: broken vs fixed
from langchain.text_splitter import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(chunk_size=200, chunk_overlap=300)
chunks = text_splitter.split_text(long_text) # triggers ValueError import os
from langchain.text_splitter import RecursiveCharacterTextSplitter
# Fixed: chunk_size > chunk_overlap
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
chunks = text_splitter.split_text(long_text)
print(f"Split into {len(chunks)} chunks") Workaround
Wrap the split_text call in try/except ValueError, and if caught, fallback to a simpler splitter like CharacterTextSplitter with safe parameters.
Prevention
Always validate and enforce chunk_size > chunk_overlap in your splitter configuration and add unit tests to catch invalid parameter combinations before deployment.