Code Intermediate medium · 6 min

Tree summarize: hierarchical synthesis

What you will learn

Tree summarize recursively condenses documents by building a hierarchy of summaries, letting you process hundreds of pages without token explosion.

Why this matters

When you have too many documents for a single LLM call, tree summarize elegantly scales by summarizing groups of documents, then summarizing those summaries: enabling semantic compression while preserving meaning across large document sets.

Skip if: Don't use tree summarize when you have fewer than 10 documents or when you need to preserve exact quote-level precision: the hierarchical abstraction can lose granular details in favor of semantic compression.

Explanation

What it is: Tree summarize is a recursive summarization strategy that builds a tree structure of summaries. Instead of throwing all documents at an LLM at once, it groups documents into chunks, summarizes each group, then summarizes the summaries. This creates a hierarchy that progressively compresses information.

How it works: You pass documents and a summarize callable to a tree-based query engine. The engine recursively batches documents (typically 4-10 per batch), sends each batch to the LLM for summarization, then treats those summaries as a new layer of documents. This continues until a single summary remains. The key insight: each layer reduces document count by your batch size, so 100 documents → 25 summaries → 6 summaries → 1 summary in just 3 rounds.

When to use it: Use tree summarize when you have 20+ documents, limited token budget, or you're running batched inference. It's ideal for document analysis tasks, research synthesis, and multi-document question answering where you don't need fine-grained context from every document.

Analogy

Think of a newspaper editor reading 100 articles. Instead of memorizing every detail, they write a 1-page summary per 10 articles (10 summaries). Then they write a 1-page summary of those summaries (1 summary). They've now synthesized 100 articles into their brain's working memory without needing the originals.

Code

Illustrative only - not runnable without a valid API key

python

from llama_index.core import Document, VectorStoreIndex, Settings, SimpleDirectoryReader
from llama_index.core.response_synthesizers import TreeSummarize
from openai import OpenAI
import os

os.environ['OPENAI_API_KEY'] = 'sk-your-key-here'

Settings.llm = OpenAI(model='gpt-4-turbo')

docs = [
    Document(text='The Amazon rainforest spans 5.5 million square kilometers across nine countries. It produces 20% of the world\'s oxygen and stores 150-200 billion tons of carbon.'),
    Document(text='Deforestation in the Amazon has accelerated due to cattle ranching and agriculture. Between 2000 and 2020, approximately 17% of the original forest was lost.'),
    Document(text='Indigenous communities have inhabited the Amazon for over 11,000 years. They maintain sustainable practices that preserve 80% more biodiversity than protected areas without indigenous management.'),
    Document(text='The Amazon River is 6,400 kilometers long and discharges 209,000 cubic meters of water per second into the Atlantic Ocean. It contains 10% of all river water on Earth.'),
    Document(text='Climate change threatens Amazon stability through increased droughts and forest fires. Scientists warn of a tipping point where the rainforest could convert to savanna within 50 years.'),
]

summarizer = TreeSummarize()
response = summarizer.get_response(
    query_str='What is the ecological and cultural significance of the Amazon?',
    nodes=[doc for doc in docs],
)

print('Final Summary:')
print(response)

Output

Final Summary:
The Amazon rainforest is a critical global ecosystem spanning 5.5 million square kilometers, producing 20% of the world's oxygen and storing 150-200 billion tons of carbon. The Amazon River, discharging 209,000 cubic meters of water per second, contains 10% of Earth's river water. Indigenous communities, who have inhabited the region for 11,000 years, maintain sustainable practices preserving 80% more biodiversity than protected areas. However, the Amazon faces severe threats: 17% of the original forest has been lost to deforestation driven by cattle ranching and agriculture since 2000, while climate change poses a critical risk of converting the rainforest to savanna within 50 years, potentially triggering an irreversible tipping point.

What just happened?

The code created 5 documents about the Amazon, then passed them to TreeSummarize. Internally, the summarizer batched the documents (likely 2 per batch), created summaries of each batch via GPT-4-turbo, then summarized those summaries into a single coherent response that synthesizes information from all 5 documents while maintaining semantic coherence.

Common gotcha

Developers often assume tree summarize returns intermediate summaries or a tree structure they can inspect. It doesn't: it returns only the final top-level summary. If you need visibility into how documents were grouped or intermediate summaries, you must manually implement batching or use a different approach like sequential summarization with logging.

Error recovery

RateLimitError

Tree summarize makes multiple LLM calls. If you hit rate limits, reduce document count per batch or add exponential backoff: wrap the summarizer call in a retry loop with time.sleep(2 ** attempt) between attempts.

AttributeError: 'list' object has no attribute 'node_id'

You passed raw Document objects instead of Node objects. Convert with: nodes = [document.to_langchain_document() for document in docs] or use index.docstore.docs.values() to get proper Node-wrapped documents.

TokenCountingError / context_len_exceeded

Even hierarchical summarization can hit token limits if your document text is enormous. Split documents into smaller chunks before passing to TreeSummarize: use SimpleDirectoryReader with chunk_size=1024 and chunk_overlap=20.

Experienced dev note

Tree summarize is often dismissed as 'just recursive summarization' by developers new to LLM workflows, but the real power is in token economics: a 100-document dataset that would need 50k tokens in a flat summarization might need only 8k tokens with tree summarize because each hierarchical layer compresses semantically, not statistically. The secondary win: errors don't cascade: if one batch summary is slightly off, subsequent layers adjust naturally. In production, pair tree summarize with deterministic batch ordering (sort documents by ID before batching) so results are reproducible across runs.

Check your understanding

Why does tree summarize not simply concatenate all document text and send it to the LLM in a single call? What specific problem does the hierarchical approach solve that a naive concatenation approach does not?

Show answer hint

A correct answer must mention token budget/context limits as the primary constraint, and explain how hierarchical batching reduces token count through progressive compression: not just 'it's faster' but the actual mathematical savings from processing 100 docs in layers versus all-at-once.

VERSION TreeSummarize moved from llama_index.response_synthesizers to llama_index.core.response_synthesizers in 0.10.0. If using llama-index-core < 0.10.0, import from llama_index.response_synthesizers instead.

Next, explore <code>Refine</code> synthesis: the sequential alternative to tree summarize that iteratively builds summaries by processing one document at a time, trading latency for better detail preservation.

Community Notes

No notes yetBe the first to share a version-specific fix or tip.