How to beginner · 3 min read

How to use SemanticChunker in LangChain

Quick answer

Use SemanticChunker in LangChain to split documents into semantically meaningful chunks by leveraging embeddings and similarity. Initialize it with an embedding model and call chunk() on your text to get context-aware chunks.

PREREQUISITES

Python 3.8+
pip install langchain>=0.2.0
OpenAI API key (free tier works)
pip install openai>=1.0

Setup

Install langchain and openai packages, and set your OpenAI API key as an environment variable.

Install packages: pip install langchain openai
Set environment variable: export OPENAI_API_KEY='your_api_key' (Linux/macOS) or setx OPENAI_API_KEY "your_api_key" (Windows)

bash

pip install langchain openai

Step by step

This example shows how to use SemanticChunker with OpenAIEmbeddings to chunk a long text semantically.

python

import os
from langchain_community.text_splitter import SemanticChunker
from langchain_openai import OpenAIEmbeddings

# Initialize embeddings with OpenAI
embeddings = OpenAIEmbeddings(openai_api_key=os.environ["OPENAI_API_KEY"])

# Create SemanticChunker with embeddings
chunker = SemanticChunker(embedding=embeddings, chunk_size=500, chunk_overlap=50)

# Example long text
text = (
    "LangChain is a framework for developing applications powered by language models. "
    "SemanticChunker splits text into chunks based on semantic similarity rather than fixed length, "
    "improving retrieval and context relevance in downstream tasks."
)

# Chunk the text
chunks = chunker.chunk(text)

# Print chunks
for i, chunk in enumerate(chunks, 1):
    print(f"Chunk {i}: {chunk}\n")

output

Chunk 1: LangChain is a framework for developing applications powered by language models.

Chunk 2: SemanticChunker splits text into chunks based on semantic similarity rather than fixed length, improving retrieval and context relevance in downstream tasks.

Common variations

Adjust chunk_size and chunk_overlap to control chunk length and overlap.
Use different embedding models by passing other Embedding implementations.
Integrate SemanticChunker with LangChain document loaders and retrievers for enhanced pipelines.

Troubleshooting

If chunks are too small or too large, tune chunk_size and chunk_overlap.
Ensure your OpenAI API key is set correctly in os.environ["OPENAI_API_KEY"].
Check network connectivity if embedding calls fail.

✅

Key Takeaways

Use SemanticChunker with embeddings to split text semantically, improving context relevance.
Tune chunk_size and chunk_overlap parameters for optimal chunk granularity.
Integrate SemanticChunker with LangChain pipelines for better document retrieval and QA.

Verified 2026-04 · gpt-4o, text-embedding-3-small

Verify ↗