How to Intermediate · 3 min read

Map-reduce for long context explained

Quick answer

Map-reduce for long context is a technique where a large input text is split into smaller chunks (map phase), each chunk is processed independently by an LLM to generate summaries or embeddings, and then these partial results are combined (reduce phase) to produce a final output. This approach overcomes context window limits by breaking down and aggregating information in manageable pieces.

PREREQUISITES

Python 3.8+
OpenAI API key (free tier works)
pip install openai>=1.0

Setup

Install the openai Python package and set your API key as an environment variable.

bash

pip install openai>=1.0

output

Collecting openai
  Downloading openai-1.x.x-py3-none-any.whl
Installing collected packages: openai
Successfully installed openai-1.x.x

Step by step

This example demonstrates a simple map-reduce approach to summarize a long text using gpt-4o. The text is split into chunks, each chunk is summarized (map), and then the summaries are combined into a final summary (reduce).

python

import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# Sample long text
long_text = (
    "In the world of AI, handling long documents is challenging due to context window limits. "
    "Map-reduce is a strategy to split the text into chunks, process each chunk independently, "
    "and then combine the results. This allows models to work with effectively unlimited context. "
    "Each chunk is summarized or embedded, and these partial outputs are aggregated to form a coherent final output."
)

# Split text into chunks (simple split by sentences here for demo)
chunks = long_text.split('. ')

# Map phase: summarize each chunk
summaries = []
for chunk in chunks:
    if not chunk.strip():
        continue
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": f"Summarize this: {chunk.strip()}"}]
    )
    summary = response.choices[0].message.content.strip()
    summaries.append(summary)

# Reduce phase: combine summaries into final summary
combined_summary_prompt = (
    "Combine these summaries into one concise summary:\n" + "\n".join(summaries)
)
final_response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": combined_summary_prompt}]
)
final_summary = final_response.choices[0].message.content.strip()

print("Final summary:", final_summary)

output

Final summary: Map-reduce enables AI models to handle long texts by splitting them into smaller parts, summarizing each independently, and then combining these summaries into a coherent overall summary.

Common variations

Use asynchronous calls to speed up the map phase when processing many chunks.
Replace summarization with embedding generation for retrieval-augmented generation (RAG) workflows.
Use different models like gpt-4o-mini for cost-effective processing or claude-3-5-sonnet-20241022 for alternative LLMs.
Implement chunking based on token counts rather than sentences for better control over context size.

python

import asyncio
import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

async def summarize_chunk(chunk: str) -> str:
    response = await client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": f"Summarize this: {chunk.strip()}"}]
    )
    return response.choices[0].message.content.strip()

async def main():
    chunks = ["Chunk one text.", "Chunk two text.", "Chunk three text."]
    summaries = await asyncio.gather(*(summarize_chunk(c) for c in chunks))
    combined_prompt = "Combine these summaries into one:\n" + "\n".join(summaries)
    final_response = await client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": combined_prompt}]
    )
    print("Final summary:", final_response.choices[0].message.content.strip())

asyncio.run(main())

output

Final summary: Combined concise summary of all chunks.

Troubleshooting

If you hit rate limits during the map phase, add delays or reduce concurrency.
If chunk summaries are too short or lose context, increase chunk size or provide more detailed prompts.
Ensure your chunking respects token limits of the model to avoid truncation errors.
Check your API key environment variable is set correctly to avoid authentication errors.

✅

Key Takeaways

Map-reduce breaks long texts into chunks to bypass LLM context window limits.
Process chunks independently (map) then combine results (reduce) for scalable summarization.
Use async calls and token-based chunking for efficiency and accuracy.
Choose models and chunk sizes based on cost, speed, and context constraints.
Handle API limits and chunking carefully to avoid errors and maintain quality.

Verified 2026-04 · gpt-4o, gpt-4o-mini, claude-3-5-sonnet-20241022

Verify ↗