How to Intermediate · 3 min read

Hierarchical summarization explained

Quick answer
Hierarchical summarization is a technique where large documents are split into smaller chunks, each chunk is summarized individually using a language model, and then these summaries are recursively combined to produce a concise overall summary. This approach efficiently manages large context windows beyond a single LLM input limit by summarizing in multiple levels.

PREREQUISITES

  • Python 3.8+
  • OpenAI API key (free tier works)
  • pip install openai>=1.0

Setup

Install the openai Python package and set your API key as an environment variable for authentication.

bash
pip install openai
output
Collecting openai
  Downloading openai-1.x.x-py3-none-any.whl (xx kB)
Installing collected packages: openai
Successfully installed openai-1.x.x

Step by step

This example demonstrates hierarchical summarization by splitting a long text into chunks, summarizing each chunk with gpt-4o, then summarizing the combined chunk summaries to get a final summary.

python
import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# Sample long text split into chunks
chunks = [
    "Chunk 1: The history of AI began in the 1950s...",
    "Chunk 2: In recent years, large language models have advanced...",
    "Chunk 3: Applications of AI include healthcare, finance, and more..."
]

# Function to summarize a single chunk

def summarize_chunk(text):
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": f"Summarize this text concisely:\n{text}"}]
    )
    return response.choices[0].message.content.strip()

# Summarize each chunk
chunk_summaries = [summarize_chunk(chunk) for chunk in chunks]

# Combine chunk summaries into one text
combined_summary_text = "\n".join(chunk_summaries)

# Summarize the combined summaries for final output
final_summary = summarize_chunk(combined_summary_text)

print("Final hierarchical summary:\n", final_summary)
output
Final hierarchical summary:
 AI's history started in the 1950s, evolving rapidly with large language models recently. Its applications span healthcare, finance, and more, showcasing broad impact.

Common variations

  • Use asynchronous calls with asyncio for parallel chunk summarization.
  • Adjust chunk size based on LLM context window limits.
  • Use different models like gpt-4o-mini for cost efficiency or claude-3-5-sonnet-20241022 for alternative performance.
  • Implement multi-level recursion for very large documents by summarizing summaries repeatedly.

Troubleshooting

  • If summaries are too generic, provide more detailed instructions in the prompt.
  • If you hit rate limits, add retry logic or slow down requests.
  • Ensure chunk sizes fit within the model's token limit to avoid truncation.
  • Check environment variable OPENAI_API_KEY is set correctly to avoid authentication errors.

Key Takeaways

  • Hierarchical summarization manages large texts by chunking and recursively summarizing to fit LLM context limits.
  • Use concise prompts for chunk summaries to maintain clarity in the final summary.
  • Adjust chunk size and model choice based on cost and performance needs.
  • Async processing can speed up summarization of multiple chunks.
  • Always verify API keys and token limits to avoid errors.
Verified 2026-04 · gpt-4o, gpt-4o-mini, claude-3-5-sonnet-20241022
Verify ↗