How to beginner · 3 min read

How to use CharacterTextSplitter in LangChain

Quick answer
Use CharacterTextSplitter from langchain.text_splitter to split large text into chunks by characters with customizable chunk size and overlap. Instantiate it with parameters like chunk_size and chunk_overlap, then call split_text() on your input string to get a list of text chunks.

PREREQUISITES

  • Python 3.8+
  • pip install langchain>=0.2
  • Basic Python knowledge

Setup

Install LangChain if you haven't already. The CharacterTextSplitter is included in the langchain.text_splitter module.

bash
pip install langchain>=0.2

Step by step

Import CharacterTextSplitter, create an instance with desired chunk_size and chunk_overlap, then call split_text() on your text to get chunks.

python
from langchain.text_splitter import CharacterTextSplitter

text = (
    "LangChain is a powerful framework for building applications with large language models. "
    "It provides tools for prompt management, chaining, memory, and text splitting. "
    "CharacterTextSplitter splits text into chunks by characters, useful for chunking documents."
)

# Create a splitter with chunk size 50 and overlap 10
splitter = CharacterTextSplitter(chunk_size=50, chunk_overlap=10)

chunks = splitter.split_text(text)

for i, chunk in enumerate(chunks, 1):
    print(f"Chunk {i}: {chunk}\n")
output
Chunk 1: LangChain is a powerful framework for building applicatio

Chunk 2: ions with large language models. It provides tools for prompt

Chunk 3:  management, chaining, memory, and text splitting. CharacterTe

Chunk 4: xtSplitter splits text into chunks by characters, useful for 

Chunk 5: chunking documents.

Common variations

  • Adjust chunk_size and chunk_overlap to control chunk length and context overlap.
  • Use split_documents() to split a list of Document objects.
  • Combine with other splitters like RecursiveCharacterTextSplitter for hierarchical splitting.
python
from langchain.schema import Document

# Example splitting multiple documents
docs = [Document(page_content=text), Document(page_content=text)]
chunks = splitter.split_documents(docs)
print(f"Total chunks from documents: {len(chunks)}")
output
Total chunks from documents: 10

Troubleshooting

  • If chunks are too small or too large, adjust chunk_size and chunk_overlap.
  • Ensure input text is a string; otherwise, split_text() will raise an error.
  • For very large texts, consider streaming or batch processing to avoid memory issues.

Key Takeaways

  • Use CharacterTextSplitter to split text into fixed-size character chunks with overlap.
  • Customize chunk_size and chunk_overlap to balance chunk length and context.
  • You can split both raw strings and Document objects using split_text() and split_documents() respectively.
Verified 2026-04
Verify ↗