How to beginner · 3 min read

How to use RecursiveCharacterTextSplitter in LangChain

Q: How to use RecursiveCharacterTextSplitter in LangChain

Use RecursiveCharacterTextSplitter from langchain.text_splitter to split large documents into smaller chunks recursively by characters, preserving semantic boundaries. Instantiate it with parameters like chunk_size and chunk_overlap, then call split_text() on your input string to get the chunks.

Quick answer

Use RecursiveCharacterTextSplitter from langchain.text_splitter to split large documents into smaller chunks recursively by characters, preserving semantic boundaries. Instantiate it with parameters like chunk_size and chunk_overlap, then call split_text() on your input string to get the chunks.

PREREQUISITES

Python 3.8+
pip install langchain>=0.2
Basic knowledge of Python

Setup

Install LangChain if you haven't already. Ensure you have Python 3.8 or newer.

bash

pip install langchain>=0.2

Step by step

Import RecursiveCharacterTextSplitter, create an instance with your desired chunk_size and chunk_overlap, then split your text.

python

from langchain.text_splitter import RecursiveCharacterTextSplitter

text = """LangChain is a powerful framework for building applications with language models. """ 
text += """It provides utilities for text splitting, prompt management, and chaining calls to LLMs."""

# Initialize the splitter
splitter = RecursiveCharacterTextSplitter(chunk_size=50, chunk_overlap=10)

# Split the text into chunks
chunks = splitter.split_text(text)

# Print the chunks
for i, chunk in enumerate(chunks, 1):
    print(f"Chunk {i}:", chunk)

output

Chunk 1: LangChain is a powerful framework for building
Chunk 2: applications with language models. It provides
Chunk 3: utilities for text splitting, prompt management,
Chunk 4: and chaining calls to LLMs.

Common variations

You can adjust chunk_size and chunk_overlap to control chunk length and overlap.
Use split_documents() to split a list of Document objects.
Combine with LangChain's document loaders and embeddings for retrieval-augmented generation.

python

from langchain.schema import Document

# Example splitting multiple documents
docs = [Document(page_content=text)]
chunks_docs = splitter.split_documents(docs)

for i, doc in enumerate(chunks_docs, 1):
    print(f"Doc chunk {i}:", doc.page_content)

output

Doc chunk 1: LangChain is a powerful framework for building
Doc chunk 2: applications with language models. It provides
Doc chunk 3: utilities for text splitting, prompt management,
Doc chunk 4: and chaining calls to LLMs.

Troubleshooting

If chunks are too small or too large, adjust chunk_size and chunk_overlap.
Ensure input text is a string; otherwise, split_text() will raise an error.
For very large texts, consider increasing chunk_size to reduce the number of chunks.

Key Takeaways

Use RecursiveCharacterTextSplitter to split text recursively by characters with overlap.
Adjust chunk_size and chunk_overlap to optimize chunk length for your use case.
You can split both raw text strings and LangChain Document objects.
Proper chunking improves downstream tasks like embeddings and retrieval.
Always validate input types and tune parameters for best results.

Verified 2026-04

Verify ↗

Community Notes

No notes yetBe the first to share a version-specific fix or tip.