How to beginner · 3 min read

How to split documents in LangChain

Quick answer
Use LangChain's built-in text splitters like RecursiveCharacterTextSplitter or CharacterTextSplitter to divide documents into manageable chunks. These splitters help prepare documents for embedding or language model processing by splitting on characters, sentences, or paragraphs.

PREREQUISITES

  • Python 3.8+
  • pip install langchain>=0.2
  • Basic familiarity with LangChain document objects

Setup

Install LangChain and set up your environment to work with document splitting.

bash
pip install langchain>=0.2

Step by step

This example shows how to split a simple text document into chunks using RecursiveCharacterTextSplitter. This splitter recursively splits text by characters and respects chunk size and overlap.

python
from langchain_community.document_loaders import TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

# Load a sample document
loader = TextLoader("sample.txt")
docs = loader.load()

# Initialize the splitter
splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)

# Split the documents
split_docs = splitter.split_documents(docs)

# Print the first chunk content
print(split_docs[0].page_content)
output
Contents of the first 500-character chunk of sample.txt with 50 characters overlapping the next chunk.

Common variations

You can use other splitters like CharacterTextSplitter for simple fixed-size splits or MarkdownTextSplitter for markdown-aware splitting. Async splitting is also supported in LangChain's newer versions.

python
from langchain.text_splitter import CharacterTextSplitter

splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=100)
split_docs = splitter.split_documents(docs)

print(f"Number of chunks: {len(split_docs)}")
output
Number of chunks: X  # depends on document length

Troubleshooting

  • If chunks are too large or too small, adjust chunk_size and chunk_overlap parameters.
  • If splitting breaks sentences awkwardly, use RecursiveCharacterTextSplitter which tries to split on natural boundaries.
  • Ensure your document loader returns Document objects compatible with splitters.

Key Takeaways

  • Use LangChain's text splitters to chunk documents for better AI processing.
  • Adjust chunk size and overlap to balance context retention and chunk count.
  • Choose splitters based on document type and desired splitting granularity.
Verified 2026-04
Verify ↗