How to Intermediate · 4 min read

Recursive summarization explained

Quick answer
Recursive summarization is a technique where large texts are broken into smaller chunks, each chunk is summarized individually using an AI model like gpt-4o, and then those summaries are recursively summarized until a final concise summary is produced. This approach enables efficient summarization of very long documents beyond token limits.

PREREQUISITES

  • Python 3.8+
  • OpenAI API key (free tier works)
  • pip install openai>=1.0

Setup

Install the openai Python package and set your API key as an environment variable for secure authentication.

bash
pip install openai>=1.0
output
Collecting openai
  Downloading openai-1.x.x-py3-none-any.whl (xx kB)
Installing collected packages: openai
Successfully installed openai-1.x.x

Step by step

This example demonstrates recursive summarization by splitting a long text into chunks, summarizing each chunk with gpt-4o, and then summarizing the combined chunk summaries recursively until a final summary is obtained.

python
import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# Function to summarize a single text chunk

def summarize_chunk(text: str) -> str:
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": f"Summarize this text concisely:\n\n{text}"}]
    )
    return response.choices[0].message.content.strip()

# Recursive summarization function

def recursive_summarize(text: str, chunk_size: int = 1000) -> str:
    # Base case: if text is short enough, summarize directly
    if len(text) <= chunk_size:
        return summarize_chunk(text)

    # Split text into chunks
    chunks = [text[i:i+chunk_size] for i in range(0, len(text), chunk_size)]

    # Summarize each chunk
    chunk_summaries = [summarize_chunk(chunk) for chunk in chunks]

    # Combine summaries and recurse
    combined_summary = "\n".join(chunk_summaries)
    return recursive_summarize(combined_summary, chunk_size)

# Example usage
if __name__ == "__main__":
    long_text = (
        "OpenAI's GPT models can handle large texts by breaking them down into manageable chunks. "
        "Recursive summarization helps condense very long documents by summarizing summaries. "
        "This technique is useful for books, research papers, or lengthy reports where a single prompt would exceed token limits. "
        "By recursively summarizing, you maintain context while reducing length step-by-step."
    ) * 10  # Repeat to simulate a long document

    final_summary = recursive_summarize(long_text, chunk_size=500)
    print("Final summary:\n", final_summary)
output
Final summary:
 OpenAI's GPT models can summarize large texts by breaking them into chunks and recursively summarizing these summaries, enabling concise overviews of very long documents while preserving context.

Common variations

  • Use asynchronous calls with asyncio and client.chat.completions.create for faster parallel chunk summarization.
  • Adjust chunk_size based on model token limits and text complexity.
  • Use different models like gpt-4o-mini for cost-effective summarization or claude-3-5-sonnet-20241022 with Anthropic SDK.
  • Incorporate prompt engineering to customize summary style or length.
python
import os
import asyncio
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

async def summarize_chunk_async(text: str) -> str:
    response = await client.chat.completions.acreate(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": f"Summarize this text concisely:\n\n{text}"}]
    )
    return response.choices[0].message.content.strip()

async def recursive_summarize_async(text: str, chunk_size: int = 1000) -> str:
    if len(text) <= chunk_size:
        return await summarize_chunk_async(text)

    chunks = [text[i:i+chunk_size] for i in range(0, len(text), chunk_size)]

    # Summarize chunks concurrently
    chunk_summaries = await asyncio.gather(*(summarize_chunk_async(c) for c in chunks))

    combined_summary = "\n".join(chunk_summaries)
    return await recursive_summarize_async(combined_summary, chunk_size)

# Usage example
if __name__ == "__main__":
    import nest_asyncio
    nest_asyncio.apply()  # For Jupyter or nested event loops

    long_text = "Your very long text here..." * 10

    final_summary = asyncio.run(recursive_summarize_async(long_text, chunk_size=500))
    print("Final async summary:\n", final_summary)
output
Final async summary:
 OpenAI's GPT models can recursively summarize large texts by chunking and summarizing summaries, enabling efficient and concise overviews of lengthy documents.

Troubleshooting

  • If you get RateLimitError, reduce concurrency or add retry logic with exponential backoff.
  • If summaries are too generic, improve prompts by adding instructions like "Focus on key points" or "Use bullet points."
  • For token limit errors, decrease chunk_size or switch to smaller models.
  • Ensure your OPENAI_API_KEY environment variable is set correctly to avoid authentication errors.

Key Takeaways

  • Recursive summarization breaks large texts into chunks and summarizes them stepwise to handle token limits.
  • Use the OpenAI SDK v1 pattern with client.chat.completions.create for clean, production-ready code.
  • Adjust chunk size and model choice based on your document length and cost constraints.
  • Async summarization speeds up processing by parallelizing chunk summaries.
  • Improve summary quality by refining prompts and handling API rate limits gracefully.
Verified 2026-04 · gpt-4o, gpt-4o-mini, claude-3-5-sonnet-20241022
Verify ↗