How to Intermediate · 3 min read

How chunk size affects RAG quality

Quick answer
In retrieval-augmented generation (RAG), chunk size directly impacts retrieval relevance and generation quality. Smaller chunks improve retrieval precision but may lose context, while larger chunks preserve context but risk retrieving less focused information. Optimal chunk size balances context retention and retrieval accuracy for best RAG performance.

PREREQUISITES

  • Python 3.8+
  • OpenAI API key (free tier works)
  • pip install openai>=1.0

Setup

Install the openai Python SDK and set your API key as an environment variable to run the example code.

  • Install SDK: pip install openai
  • Set API key in your shell: export OPENAI_API_KEY='your_api_key'
bash
pip install openai

Step by step

This example demonstrates how to chunk a document into different sizes and query a vector store for retrieval-augmented generation using gpt-4o. It shows how chunk size affects retrieved context and final answer quality.

python
import os
from openai import OpenAI

# Initialize OpenAI client
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# Sample document text
document = """OpenAI develops advanced AI models that enable natural language understanding and generation. """ * 50

# Function to chunk text into fixed sizes

def chunk_text(text, chunk_size):
    return [text[i:i+chunk_size] for i in range(0, len(text), chunk_size)]

# Example chunk sizes to test
chunk_sizes = [100, 500, 1000]

for size in chunk_sizes:
    chunks = chunk_text(document, size)
    print(f"\nChunk size: {size}, Number of chunks: {len(chunks)}")

    # Simulate retrieval by selecting top 2 chunks (mock retrieval)
    retrieved_chunks = chunks[:2]
    context = "\n".join(retrieved_chunks)

    # Query with context to generate answer
    messages = [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": f"Based on the following context, summarize OpenAI's focus:\n{context}"}
    ]

    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=messages
    )

    answer = response.choices[0].message.content
    print(f"Answer with chunk size {size}:\n{answer}")
output
Chunk size: 100, Number of chunks: 25
Answer with chunk size 100:
OpenAI focuses on developing advanced AI models that enable natural language understanding and generation.

Chunk size: 500, Number of chunks: 5
Answer with chunk size 500:
OpenAI develops advanced AI models that enable natural language understanding and generation, focusing on creating powerful and versatile AI technologies.

Chunk size: 1000, Number of chunks: 3
Answer with chunk size 1000:
OpenAI is dedicated to advancing AI through the development of sophisticated models that excel in natural language processing and generation, aiming to create impactful AI solutions.

Common variations

You can experiment with asynchronous calls or different models like gpt-4o-mini for faster but less detailed responses. Streaming responses can be used for real-time output. Adjust chunk size based on document length and retrieval system capabilities.

python
import asyncio
import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

async def async_query(context):
    messages = [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": f"Summarize OpenAI's focus based on:\n{context}"}
    ]
    response = await client.chat.completions.acreate(
        model="gpt-4o-mini",
        messages=messages
    )
    return response.choices[0].message.content

async def main():
    document = "OpenAI develops advanced AI models that enable natural language understanding and generation." * 20
    chunks = [document[i:i+300] for i in range(0, len(document), 300)]
    context = "\n".join(chunks[:2])
    summary = await async_query(context)
    print("Async summary:", summary)

asyncio.run(main())
output
Async summary: OpenAI develops advanced AI models focused on natural language understanding and generation.

Troubleshooting

  • If retrieval returns irrelevant chunks, reduce chunk size to improve precision.
  • If generated answers lack context, increase chunk size to preserve more information.
  • Ensure your vector store or retrieval system supports the chunk size you choose.
  • Check API rate limits when querying multiple chunks.

Key Takeaways

  • Smaller chunk sizes improve retrieval precision but may lose broader context.
  • Larger chunks preserve context but can dilute retrieval relevance.
  • Balance chunk size based on document complexity and retrieval system capabilities.
Verified 2026-04 · gpt-4o-mini
Verify ↗