How to Intermediate · 4 min read

What chunk overlap to use for RAG

Q: What chunk overlap to use for RAG

Use a chunk overlap of 10-30% of the chunk size for RAG workflows to maintain context continuity across chunks without excessive redundancy. For example, if your chunk size is 500 tokens, an overlap of 50-150 tokens ensures smooth retrieval and better answer quality.

Quick answer

Use a chunk overlap of 10-30% of the chunk size for RAG workflows to maintain context continuity across chunks without excessive redundancy. For example, if your chunk size is 500 tokens, an overlap of 50-150 tokens ensures smooth retrieval and better answer quality.

PREREQUISITES

Python 3.8+
OpenAI API key (free tier works)
pip install openai>=1.0

Setup

Install the openai Python package and set your API key as an environment variable for secure access.

bash

pip install openai>=1.0

Step by step

This example demonstrates how to split a document into chunks with 20% overlap, suitable for RAG pipelines.

python

import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

def chunk_text(text, chunk_size=500, overlap=100):
    chunks = []
    start = 0
    text_length = len(text)
    while start < text_length:
        end = min(start + chunk_size, text_length)
        chunks.append(text[start:end])
        start += chunk_size - overlap
    return chunks

# Example usage
sample_text = """Your long document text goes here. It can be several thousand tokens long."""
chunks = chunk_text(sample_text, chunk_size=500, overlap=100)
print(f"Generated {len(chunks)} chunks with 20% overlap.")

output

Generated X chunks with 20% overlap.

Common variations

You can adjust overlap based on document type and retrieval needs:

Lower overlap (10%) for highly structured data like tables or code.
Higher overlap (30%) for narrative or conversational text to preserve context.
Use semantic chunking with embeddings to dynamically determine overlap.

Overlap %	Use case	Effect
10%	Structured data (tables, code)	Less redundancy, faster retrieval
20%	General documents	Balanced context and efficiency
30%	Narrative or conversational text	Better context continuity, more tokens

Troubleshooting

If you notice poor answer quality or missing context in RAG outputs, increase the chunk overlap incrementally by 10%. Conversely, if retrieval latency or token usage is too high, reduce overlap.

Also, ensure chunk boundaries do not split sentences or semantic units to avoid context loss.

Key Takeaways

Use 10-30% chunk overlap to balance context continuity and efficiency in RAG.
Adjust overlap based on document type: lower for structured data, higher for narratives.
Avoid splitting sentences or semantic units when chunking to preserve meaning.

Verified 2026-04 · gpt-4o, claude-3-5-sonnet-20241022

Verify ↗

Community Notes

No notes yetBe the first to share a version-specific fix or tip.