How to beginner · 3 min read

Small-to-big chunking explained

Quick answer

Small-to-big chunking is a technique where data is first split into small chunks and then progressively merged into larger chunks to optimize AI model context usage. This approach balances granularity and context size, enabling efficient processing with models like gpt-4o.

PREREQUISITES

Python 3.8+
OpenAI API key (free tier works)
pip install openai>=1.0

Setup

Install the openai Python package and set your API key as an environment variable for secure access.

bash

pip install openai>=1.0

Step by step

This example demonstrates small-to-big chunking by first splitting text into small chunks, then merging them into bigger chunks before sending to the gpt-4o model for summarization.

python

import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# Sample text to chunk
text = (
    "Artificial intelligence is transforming industries by enabling new capabilities. "
    "However, large documents often exceed model context limits. "
    "Small-to-big chunking helps manage this by splitting and merging text efficiently."
)

# Step 1: Small chunking (split by sentences)
small_chunks = text.split('. ')

# Step 2: Merge small chunks into bigger chunks (combine 2 sentences each)
big_chunks = []
chunk_size = 2
for i in range(0, len(small_chunks), chunk_size):
    merged = '. '.join(small_chunks[i:i+chunk_size])
    if not merged.endswith('.'):  # Ensure punctuation
        merged += '.'
    big_chunks.append(merged)

# Step 3: Process each big chunk with the model
for idx, chunk in enumerate(big_chunks, 1):
    messages = [{"role": "user", "content": f"Summarize this: {chunk}"}]
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=messages
    )
    summary = response.choices[0].message.content
    print(f"Chunk {idx} summary:\n{summary}\n")

output

Chunk 1 summary:
Artificial intelligence is transforming industries by enabling new capabilities.

Chunk 2 summary:
Large documents can exceed model context limits, so small-to-big chunking manages this by splitting and merging text efficiently.

Common variations

Use async calls with asyncio and client.chat.completions.acreate() for concurrency.
Adjust chunk sizes dynamically based on token counts using tokenizer libraries.
Apply small-to-big chunking with other models like claude-3-5-sonnet-20241022 by changing the model parameter.

Troubleshooting

If you get context length errors, reduce the big chunk size or split more granularly.
Ensure your API key is set correctly in os.environ["OPENAI_API_KEY"].
Check network connectivity if requests time out.

✅

Key Takeaways

Small-to-big chunking balances detail and context size for efficient AI processing.
Start with small chunks and merge progressively to fit model context limits.
Adjust chunk sizes dynamically based on token count for best results.

Verified 2026-04 · gpt-4o, claude-3-5-sonnet-20241022

Verify ↗