How to beginner · 3 min read

How to split text in LangChain

Quick answer
Use LangChain's built-in text splitter classes like RecursiveCharacterTextSplitter or CharacterTextSplitter to divide large text into manageable chunks. These classes allow you to specify chunk size and overlap, enabling efficient processing and embedding generation.

PREREQUISITES

  • Python 3.8+
  • pip install langchain>=0.2
  • Basic Python knowledge

Setup

Install LangChain and set up your Python environment to use text splitting utilities.

bash
pip install langchain>=0.2

Step by step

This example demonstrates how to split a long text into chunks using RecursiveCharacterTextSplitter with a chunk size of 1000 characters and an overlap of 200 characters.

python
from langchain.text_splitter import RecursiveCharacterTextSplitter

text = """LangChain is a powerful framework for building applications with language models. """ * 50  # Sample repeated text

splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
chunks = splitter.split_text(text)

for i, chunk in enumerate(chunks):
    print(f"Chunk {i+1} (length {len(chunk)}):")
    print(chunk[:200] + '...')  # Print first 200 chars
    print('---')
output
Chunk 1 (length 1000):
LangChain is a powerful framework for building applications with language models. LangChain is a powerful framework for building applications with language models. LangChain is a powerful framework for building applications with language models. LangChain is a powerful framework for building applications with language models. LangChain is a powerful framework for building applications with language models. LangChain is a powerful framework for building applications with language models. LangChain is a powerful framework for building applications with language models. LangChain is a powerful framework for building applications with language models. LangChain is a powerful framework for building applications with language models. LangChain is a powerful framework for building applications with language models. LangChain is a powerful framework for building applications with language models. LangChain is a powerful framework for building applications with language models. LangChain is a powerful framework for building applications with language models. LangChain is a powerful framework for building applications with language models. LangChain is a powerful framework for building applications with language models. LangChain is a powerful framework for building applications with language models. LangChain is a powerful framework for building applications with language models. LangChain is a powerful framework for building applications with language models. LangChain is a powerful framework for building applications with language models....
---
Chunk 2 (length 1000):
LangChain is a powerful framework for building applications with language models. LangChain is a powerful framework for building applications with language models. LangChain is a powerful framework for building applications with language models. LangChain is a powerful framework for building applications with language models. LangChain is a powerful framework for building applications with language models. LangChain is a powerful framework for building applications with language models. LangChain is a powerful framework for building applications with language models. LangChain is a powerful framework for building applications with language models. LangChain is a powerful framework for building applications with language models. LangChain is a powerful framework for building applications with language models. LangChain is a powerful framework for building applications with language models. LangChain is a powerful framework for building applications with language models. LangChain is a powerful framework for building applications with language models. LangChain is a powerful framework for building applications with language models. LangChain is a powerful framework for building applications with language models. LangChain is a powerful framework for building applications with language models. LangChain is a powerful framework for building applications with language models. LangChain is a powerful framework for building applications with language models. LangChain is a powerful framework for building applications with language models....
---

Common variations

  • Use CharacterTextSplitter for simpler splitting by characters without recursion.
  • Adjust chunk_size and chunk_overlap to balance chunk length and context overlap.
  • Use split_documents method to split a list of Document objects instead of raw text.
python
from langchain.text_splitter import CharacterTextSplitter

text = "This is a sample text to demonstrate splitting. " * 20
splitter = CharacterTextSplitter(chunk_size=50, chunk_overlap=10)
chunks = splitter.split_text(text)

print(f"Total chunks: {len(chunks)}")
print(chunks[0])
output
Total chunks: 9
This is a sample text to demonstrate splitting. This is a sample text to d...

Troubleshooting

  • If chunks are too small or too large, adjust chunk_size and chunk_overlap parameters.
  • Ensure your input text is a string; otherwise, split_text will raise an error.
  • For very large documents, consider splitting before embedding to avoid token limits.

Key Takeaways

  • Use LangChain's text splitters like RecursiveCharacterTextSplitter to chunk text efficiently.
  • Adjust chunk_size and chunk_overlap to optimize chunk length and context retention.
  • Split raw text or Document objects depending on your workflow needs.
Verified 2026-04
Verify ↗