How to beginner · 3 min read

How to use RecursiveCharacterTextSplitter in LangChain

Quick answer
Use RecursiveCharacterTextSplitter from langchain.text_splitter to split large documents into smaller chunks recursively by characters, preserving semantic boundaries. Instantiate it with parameters like chunk_size and chunk_overlap, then call split_text() on your input string to get the chunks.

PREREQUISITES

  • Python 3.8+
  • pip install langchain>=0.2
  • Basic knowledge of Python

Setup

Install LangChain if you haven't already. Ensure you have Python 3.8 or newer.

bash
pip install langchain>=0.2

Step by step

Import RecursiveCharacterTextSplitter, create an instance with your desired chunk_size and chunk_overlap, then split your text.

python
from langchain.text_splitter import RecursiveCharacterTextSplitter

text = """LangChain is a powerful framework for building applications with language models. """ 
text += """It provides utilities for text splitting, prompt management, and chaining calls to LLMs."""

# Initialize the splitter
splitter = RecursiveCharacterTextSplitter(chunk_size=50, chunk_overlap=10)

# Split the text into chunks
chunks = splitter.split_text(text)

# Print the chunks
for i, chunk in enumerate(chunks, 1):
    print(f"Chunk {i}:", chunk)
output
Chunk 1: LangChain is a powerful framework for building
Chunk 2: applications with language models. It provides
Chunk 3: utilities for text splitting, prompt management,
Chunk 4: and chaining calls to LLMs.

Common variations

  • You can adjust chunk_size and chunk_overlap to control chunk length and overlap.
  • Use split_documents() to split a list of Document objects.
  • Combine with LangChain's document loaders and embeddings for retrieval-augmented generation.
python
from langchain.schema import Document

# Example splitting multiple documents
docs = [Document(page_content=text)]
chunks_docs = splitter.split_documents(docs)

for i, doc in enumerate(chunks_docs, 1):
    print(f"Doc chunk {i}:", doc.page_content)
output
Doc chunk 1: LangChain is a powerful framework for building
Doc chunk 2: applications with language models. It provides
Doc chunk 3: utilities for text splitting, prompt management,
Doc chunk 4: and chaining calls to LLMs.

Troubleshooting

  • If chunks are too small or too large, adjust chunk_size and chunk_overlap.
  • Ensure input text is a string; otherwise, split_text() will raise an error.
  • For very large texts, consider increasing chunk_size to reduce the number of chunks.

Key Takeaways

  • Use RecursiveCharacterTextSplitter to split text recursively by characters with overlap.
  • Adjust chunk_size and chunk_overlap to optimize chunk length for your use case.
  • You can split both raw text strings and LangChain Document objects.
  • Proper chunking improves downstream tasks like embeddings and retrieval.
  • Always validate input types and tune parameters for best results.
Verified 2026-04
Verify ↗