How to Intermediate · 4 min read

Fix poor RAG retrieval from bad chunking

Q: Fix poor RAG retrieval from bad chunking

Fix poor RAG retrieval caused by bad chunking by using optimal chunk sizes (typically 500-1000 tokens) with overlaps and semantic-aware splitting methods like sentence or paragraph boundaries. Use libraries like nltk or langchain to chunk documents properly before embedding and indexing.

Quick answer

Fix poor RAG retrieval caused by bad chunking by using optimal chunk sizes (typically 500-1000 tokens) with overlaps and semantic-aware splitting methods like sentence or paragraph boundaries. Use libraries like nltk or langchain to chunk documents properly before embedding and indexing.

PREREQUISITES

Python 3.8+
OpenAI API key (free tier works)
pip install openai>=1.0 langchain nltk

Setup

Install required packages and set your environment variable for the OpenAI API key.

bash

pip install openai langchain nltk

Step by step

This example shows how to fix poor RAG retrieval by chunking text documents with sentence boundaries and overlap, then embedding and querying with OpenAI embeddings and FAISS.

python

import os
from langchain_community.vectorstores import FAISS
from langchain_openai import OpenAIEmbeddings
import nltk
from nltk.tokenize import sent_tokenize

# Download punkt tokenizer for sentence splitting
nltk.download('punkt')

# Load your document text
text = """Your long document text goes here. It can be multiple paragraphs. " \
       "Chunking by sentences with overlap improves retrieval quality."""

# Function to chunk text by sentences with overlap

def chunk_text(text, max_tokens=500, overlap=50):
    sentences = sent_tokenize(text)
    chunks = []
    current_chunk = []
    current_length = 0

    for sentence in sentences:
        sentence_length = len(sentence.split())
        if current_length + sentence_length > max_tokens:
            chunks.append(' '.join(current_chunk))
            # Overlap: keep last few sentences
            current_chunk = current_chunk[-overlap:] if overlap < len(current_chunk) else current_chunk
            current_length = sum(len(s.split()) for s in current_chunk)
        current_chunk.append(sentence)
        current_length += sentence_length

    if current_chunk:
        chunks.append(' '.join(current_chunk))
    return chunks

chunks = chunk_text(text)

# Initialize OpenAI embeddings client
embeddings = OpenAIEmbeddings(api_key=os.environ["OPENAI_API_KEY"])

# Create FAISS vector store from chunks
vectorstore = FAISS.from_texts(chunks, embeddings)

# Query example
query = "How does chunking affect retrieval quality?"
query_embedding = embeddings.embed_query(query)

# Retrieve top 3 relevant chunks
results = vectorstore.similarity_search_by_vector(query_embedding, k=3)

print("Top retrieved chunks:")
for i, doc in enumerate(results, 1):
    print(f"Chunk {i}: {doc.page_content}\n")

output

Top retrieved chunks:
Chunk 1: Chunking by sentences with overlap improves retrieval quality.

Chunk 2: Your long document text goes here. It can be multiple paragraphs.

Chunk 3: Chunking by sentences with overlap improves retrieval quality.

Common variations

Use paragraph-based chunking by splitting on double newlines for more natural boundaries.
Adjust max_tokens and overlap based on your model's context window and retrieval needs.
Use langchain built-in RecursiveCharacterTextSplitter or TokenTextSplitter for token-based chunking.
For async retrieval, use async versions of embedding and vector store calls if supported.

python

from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=500,
    chunk_overlap=50,
    separators=["\n\n", "\n", ".", "!", "?", ",", " "]
)
chunks = text_splitter.split_text(text)

# Then proceed with embeddings and vector store as before

Troubleshooting

If retrieval returns irrelevant chunks, verify chunk size is neither too small (losing context) nor too large (exceeding model limits).
Ensure overlap is sufficient (typically 10-20% of chunk size) to maintain context continuity.
Check that your tokenizer count matches chunk size units (tokens vs words).
Use semantic-aware splitters to avoid cutting sentences or paragraphs mid-way.
Confirm embeddings are generated correctly and vector store is updated after chunking changes.

✅

Key Takeaways

Use sentence or paragraph boundaries with overlap to chunk documents for RAG retrieval.
Optimal chunk size is typically 500-1000 tokens with 10-20% overlap for best context retention.
Leverage libraries like langchain's RecursiveCharacterTextSplitter for robust chunking.
Verify tokenizer counts and embedding updates after chunking changes to avoid retrieval errors.
Adjust chunking strategy based on your model's context window and retrieval quality feedback.

Verified 2026-04 · gpt-4o, text-embedding-3-small

Verify ↗