How to use map-reduce for long document summarization
Quick answer
Use the map-reduce approach by splitting a long document into chunks, summarizing each chunk with a chat.completions.create call (map step), then combining those summaries into a final summary with another chat.completions.create call (reduce step). This method handles token limits effectively for long document summarization.
PREREQUISITES
Python 3.8+OpenAI API key (free tier works)pip install openai>=1.0
Setup
Install the openai Python package and set your OpenAI API key as an environment variable.
- Install package:
pip install openai - Set environment variable in your shell:
export OPENAI_API_KEY='your_api_key'
pip install openai output
Collecting openai Downloading openai-1.x.x-py3-none-any.whl (xx kB) Installing collected packages: openai Successfully installed openai-1.x.x
Step by step
This example demonstrates the map-reduce summarization pattern using the gpt-4o model. It splits a long text into chunks, summarizes each chunk (map), then summarizes those summaries (reduce) to produce a final concise summary.
import os
from openai import OpenAI
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
# Function to split text into chunks of max tokens (approximate by characters here)
def chunk_text(text, max_chunk_size=2000):
chunks = []
start = 0
while start < len(text):
end = start + max_chunk_size
chunks.append(text[start:end])
start = end
return chunks
# Map step: summarize each chunk
def summarize_chunk(chunk):
messages = [
{"role": "user", "content": f"Summarize the following text concisely:\n\n{chunk}"}
]
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=messages,
max_tokens=300
)
return response.choices[0].message.content.strip()
# Reduce step: summarize all chunk summaries into final summary
def reduce_summaries(summaries):
combined = "\n\n".join(summaries)
messages = [
{"role": "user", "content": f"Summarize the following summaries into a concise final summary:\n\n{combined}"}
]
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=messages,
max_tokens=500
)
return response.choices[0].message.content.strip()
# Example usage
if __name__ == "__main__":
long_text = """Your very long document text goes here. It can be thousands of words long. """ * 50 # simulate long text
chunks = chunk_text(long_text)
print(f"Split into {len(chunks)} chunks.")
chunk_summaries = []
for i, chunk in enumerate(chunks, 1):
print(f"Summarizing chunk {i}...")
summary = summarize_chunk(chunk)
chunk_summaries.append(summary)
final_summary = reduce_summaries(chunk_summaries)
print("\nFinal summary:\n", final_summary) output
Split into 50 chunks. Summarizing chunk 1... Summarizing chunk 2... ... Summarizing chunk 50... Final summary: This document covers the main points of the original text, providing a concise overview of the key topics discussed throughout the long document.
Common variations
You can adapt the map-reduce summarization by:
- Using async calls with
asynciofor parallel chunk summarization. - Streaming partial summaries with
stream=Truefor faster feedback. - Choosing smaller models like
gpt-4o-minifor cost efficiency. - Adjusting chunk size based on token limits and document structure.
import asyncio
from openai import OpenAI
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
async def summarize_chunk_async(chunk):
messages = [{"role": "user", "content": f"Summarize the following text concisely:\n\n{chunk}"}]
response = await client.chat.completions.acreate(
model="gpt-4o-mini",
messages=messages,
max_tokens=300
)
return response.choices[0].message.content.strip()
async def main():
long_text = "Your very long document text goes here." * 50
chunks = [long_text[i:i+2000] for i in range(0, len(long_text), 2000)]
# Parallel map step
summaries = await asyncio.gather(*(summarize_chunk_async(c) for c in chunks))
# Reduce step
combined = "\n\n".join(summaries)
messages = [{"role": "user", "content": f"Summarize the following summaries into a concise final summary:\n\n{combined}"}]
final_response = await client.chat.completions.acreate(
model="gpt-4o-mini",
messages=messages,
max_tokens=500
)
print("Final summary:", final_response.choices[0].message.content.strip())
if __name__ == "__main__":
asyncio.run(main()) output
Final summary: This document provides a concise overview of the main points extracted from the original long text, summarizing key themes and insights effectively.
Troubleshooting
- If you hit token limit errors, reduce chunk size or max_tokens in calls.
- If summaries are too generic, add more detailed instructions in the prompt.
- For slow processing, use async calls or smaller models.
- Ensure your
OPENAI_API_KEYis set correctly to avoid authentication errors.
Key Takeaways
- Split long documents into manageable chunks to avoid token limits during summarization.
- Summarize each chunk individually (map), then combine summaries for a final concise output (reduce).
- Use async calls to speed up chunk summarization when processing large documents.
- Adjust chunk size and model choice based on cost, speed, and quality trade-offs.
- Clear prompt instructions improve summary relevance and coherence.