How to beginner · 3 min read

How to use RecursiveCharacterTextSplitter in LangChain

Q: How to use RecursiveCharacterTextSplitter in LangChain

Use RecursiveCharacterTextSplitter in LangChain to split large documents into smaller chunks recursively by characters, preserving context. Instantiate it with parameters like chunk_size and chunk_overlap, then call split_text() or split_documents() on your input text or documents.

Quick answer

Use RecursiveCharacterTextSplitter in LangChain to split large documents into smaller chunks recursively by characters, preserving context. Instantiate it with parameters like chunk_size and chunk_overlap, then call split_text() or split_documents() on your input text or documents.

PREREQUISITES

Python 3.8+
pip install langchain>=0.2
Basic familiarity with LangChain document loaders

Setup

Install LangChain if you haven't already. Ensure you have Python 3.8 or higher.

bash

pip install langchain>=0.2

Step by step

Instantiate RecursiveCharacterTextSplitter with your desired chunk_size and chunk_overlap. Then use split_text() to split a string or split_documents() to split a list of LangChain Document objects.

python

from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.schema import Document

# Example text
text = """LangChain helps you build applications with LLMs.\n""" * 10  # repeated text

# Initialize the splitter
splitter = RecursiveCharacterTextSplitter(
    chunk_size=100,
    chunk_overlap=20
)

# Split plain text
chunks = splitter.split_text(text)
print(f"Number of chunks: {len(chunks)}")
print(chunks[0])

# Split LangChain Document objects
docs = [Document(page_content=text)]
split_docs = splitter.split_documents(docs)
print(f"Number of split documents: {len(split_docs)}")
print(split_docs[0].page_content)

output

Number of chunks: 2
LangChain helps you build applications with LLMs.
LangChain helps you build applications with LLMs.
Number of split documents: 2
LangChain helps you build applications with LLMs.

Common variations

Adjust chunk_size and chunk_overlap to balance chunk length and context overlap.
Use split_documents() when working with LangChain Document objects to preserve metadata.
Combine with document loaders like PyPDFLoader for PDFs before splitting.

Troubleshooting

If chunks are too small or too large, adjust chunk_size and chunk_overlap.
If you get errors on split_documents(), ensure input is a list of Document objects.
For unexpected splits, check if your text contains unusual characters or formatting that affects splitting.

✅

Key Takeaways

Use RecursiveCharacterTextSplitter to recursively split text into manageable chunks preserving context.
Set chunk_size and chunk_overlap to control chunk length and overlap for better downstream processing.
Use split_documents for LangChain Document objects to keep metadata intact during splitting.

Verified 2026-04

Verify ↗