How to Intermediate · 4 min read

How to use map-reduce for long document summarization

Quick answer

Use the map-reduce approach by splitting a long document into chunks, summarizing each chunk with a chat.completions.create call (map step), then combining those summaries into a final summary with another chat.completions.create call (reduce step). This method handles token limits effectively for long document summarization.

PREREQUISITES

Python 3.8+
OpenAI API key (free tier works)
pip install openai>=1.0

Setup

Install the openai Python package and set your OpenAI API key as an environment variable.

Install package: pip install openai
Set environment variable in your shell: export OPENAI_API_KEY='your_api_key'

bash

pip install openai

output

Collecting openai
  Downloading openai-1.x.x-py3-none-any.whl (xx kB)
Installing collected packages: openai
Successfully installed openai-1.x.x

Step by step

This example demonstrates the map-reduce summarization pattern using the gpt-4o model. It splits a long text into chunks, summarizes each chunk (map), then summarizes those summaries (reduce) to produce a final concise summary.

python

import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# Function to split text into chunks of max tokens (approximate by characters here)
def chunk_text(text, max_chunk_size=2000):
    chunks = []
    start = 0
    while start < len(text):
        end = start + max_chunk_size
        chunks.append(text[start:end])
        start = end
    return chunks

# Map step: summarize each chunk

def summarize_chunk(chunk):
    messages = [
        {"role": "user", "content": f"Summarize the following text concisely:\n\n{chunk}"}
    ]
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=messages,
        max_tokens=300
    )
    return response.choices[0].message.content.strip()

# Reduce step: summarize all chunk summaries into final summary
def reduce_summaries(summaries):
    combined = "\n\n".join(summaries)
    messages = [
        {"role": "user", "content": f"Summarize the following summaries into a concise final summary:\n\n{combined}"}
    ]
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=messages,
        max_tokens=500
    )
    return response.choices[0].message.content.strip()

# Example usage
if __name__ == "__main__":
    long_text = """Your very long document text goes here. It can be thousands of words long. """ * 50  # simulate long text

    chunks = chunk_text(long_text)
    print(f"Split into {len(chunks)} chunks.")

    chunk_summaries = []
    for i, chunk in enumerate(chunks, 1):
        print(f"Summarizing chunk {i}...")
        summary = summarize_chunk(chunk)
        chunk_summaries.append(summary)

    final_summary = reduce_summaries(chunk_summaries)
    print("\nFinal summary:\n", final_summary)

output

Split into 50 chunks.
Summarizing chunk 1...
Summarizing chunk 2...
...
Summarizing chunk 50...

Final summary:
This document covers the main points of the original text, providing a concise overview of the key topics discussed throughout the long document.

Common variations

You can adapt the map-reduce summarization by:

Using async calls with asyncio for parallel chunk summarization.
Streaming partial summaries with stream=True for faster feedback.
Choosing smaller models like gpt-4o-mini for cost efficiency.
Adjusting chunk size based on token limits and document structure.

python

import asyncio
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

async def summarize_chunk_async(chunk):
    messages = [{"role": "user", "content": f"Summarize the following text concisely:\n\n{chunk}"}]
    response = await client.chat.completions.acreate(
        model="gpt-4o-mini",
        messages=messages,
        max_tokens=300
    )
    return response.choices[0].message.content.strip()

async def main():
    long_text = "Your very long document text goes here." * 50
    chunks = [long_text[i:i+2000] for i in range(0, len(long_text), 2000)]

    # Parallel map step
    summaries = await asyncio.gather(*(summarize_chunk_async(c) for c in chunks))

    # Reduce step
    combined = "\n\n".join(summaries)
    messages = [{"role": "user", "content": f"Summarize the following summaries into a concise final summary:\n\n{combined}"}]
    final_response = await client.chat.completions.acreate(
        model="gpt-4o-mini",
        messages=messages,
        max_tokens=500
    )
    print("Final summary:", final_response.choices[0].message.content.strip())

if __name__ == "__main__":
    asyncio.run(main())

output

Final summary: This document provides a concise overview of the main points extracted from the original long text, summarizing key themes and insights effectively.

Troubleshooting

If you hit token limit errors, reduce chunk size or max_tokens in calls.
If summaries are too generic, add more detailed instructions in the prompt.
For slow processing, use async calls or smaller models.
Ensure your OPENAI_API_KEY is set correctly to avoid authentication errors.

✅

Key Takeaways

Split long documents into manageable chunks to avoid token limits during summarization.
Summarize each chunk individually (map), then combine summaries for a final concise output (reduce).
Use async calls to speed up chunk summarization when processing large documents.
Adjust chunk size and model choice based on cost, speed, and quality trade-offs.
Clear prompt instructions improve summary relevance and coherence.

Verified 2026-04 · gpt-4o-mini

Verify ↗