How to beginner · 3 min read

How to use RecursiveCharacterTextSplitter in LangChain

Quick answer
Use RecursiveCharacterTextSplitter in LangChain to split large documents into smaller chunks recursively by characters, preserving context. Instantiate it with parameters like chunk_size and chunk_overlap, then call split_text() or split_documents() on your input text or documents.

PREREQUISITES

  • Python 3.8+
  • pip install langchain>=0.2
  • Basic familiarity with LangChain document loaders

Setup

Install LangChain if you haven't already. Ensure you have Python 3.8 or higher.

bash
pip install langchain>=0.2

Step by step

Instantiate RecursiveCharacterTextSplitter with your desired chunk_size and chunk_overlap. Then use split_text() to split a string or split_documents() to split a list of LangChain Document objects.

python
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.schema import Document

# Example text
text = """LangChain helps you build applications with LLMs.\n""" * 10  # repeated text

# Initialize the splitter
splitter = RecursiveCharacterTextSplitter(
    chunk_size=100,
    chunk_overlap=20
)

# Split plain text
chunks = splitter.split_text(text)
print(f"Number of chunks: {len(chunks)}")
print(chunks[0])

# Split LangChain Document objects
docs = [Document(page_content=text)]
split_docs = splitter.split_documents(docs)
print(f"Number of split documents: {len(split_docs)}")
print(split_docs[0].page_content)
output
Number of chunks: 2
LangChain helps you build applications with LLMs.
LangChain helps you build applications with LLMs.
Number of split documents: 2
LangChain helps you build applications with LLMs.

Common variations

  • Adjust chunk_size and chunk_overlap to balance chunk length and context overlap.
  • Use split_documents() when working with LangChain Document objects to preserve metadata.
  • Combine with document loaders like PyPDFLoader for PDFs before splitting.

Troubleshooting

  • If chunks are too small or too large, adjust chunk_size and chunk_overlap.
  • If you get errors on split_documents(), ensure input is a list of Document objects.
  • For unexpected splits, check if your text contains unusual characters or formatting that affects splitting.

Key Takeaways

  • Use RecursiveCharacterTextSplitter to recursively split text into manageable chunks preserving context.
  • Set chunk_size and chunk_overlap to control chunk length and overlap for better downstream processing.
  • Use split_documents for LangChain Document objects to keep metadata intact during splitting.
Verified 2026-04
Verify ↗