How to Intermediate · 4 min read

What chunk overlap to use for RAG

Quick answer
Use a chunk overlap of 10-30% of the chunk size for RAG workflows to maintain context continuity across chunks without excessive redundancy. For example, if your chunk size is 500 tokens, an overlap of 50-150 tokens ensures smooth retrieval and better answer quality.

PREREQUISITES

  • Python 3.8+
  • OpenAI API key (free tier works)
  • pip install openai>=1.0

Setup

Install the openai Python package and set your API key as an environment variable for secure access.

bash
pip install openai>=1.0

Step by step

This example demonstrates how to split a document into chunks with 20% overlap, suitable for RAG pipelines.

python
import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

def chunk_text(text, chunk_size=500, overlap=100):
    chunks = []
    start = 0
    text_length = len(text)
    while start < text_length:
        end = min(start + chunk_size, text_length)
        chunks.append(text[start:end])
        start += chunk_size - overlap
    return chunks

# Example usage
sample_text = """Your long document text goes here. It can be several thousand tokens long."""
chunks = chunk_text(sample_text, chunk_size=500, overlap=100)
print(f"Generated {len(chunks)} chunks with 20% overlap.")
output
Generated X chunks with 20% overlap.

Common variations

You can adjust overlap based on document type and retrieval needs:

  • Lower overlap (10%) for highly structured data like tables or code.
  • Higher overlap (30%) for narrative or conversational text to preserve context.
  • Use semantic chunking with embeddings to dynamically determine overlap.
Overlap %Use caseEffect
10%Structured data (tables, code)Less redundancy, faster retrieval
20%General documentsBalanced context and efficiency
30%Narrative or conversational textBetter context continuity, more tokens

Troubleshooting

If you notice poor answer quality or missing context in RAG outputs, increase the chunk overlap incrementally by 10%. Conversely, if retrieval latency or token usage is too high, reduce overlap.

Also, ensure chunk boundaries do not split sentences or semantic units to avoid context loss.

Key Takeaways

  • Use 10-30% chunk overlap to balance context continuity and efficiency in RAG.
  • Adjust overlap based on document type: lower for structured data, higher for narratives.
  • Avoid splitting sentences or semantic units when chunking to preserve meaning.
Verified 2026-04 · gpt-4o, claude-3-5-sonnet-20241022
Verify ↗