How to summarize long documents with LLM
Quick answer
Use a chunking strategy to split long documents into manageable parts, then summarize each chunk with a
gpt-4o or similar model via the OpenAI SDK. Finally, combine chunk summaries into a concise overall summary with another LLM call.PREREQUISITES
Python 3.8+OpenAI API key (free tier works)pip install openai>=1.0
Setup
Install the openai Python package and set your OpenAI API key as an environment variable.
- Run
pip install openaito install the SDK. - Set your API key in your shell:
export OPENAI_API_KEY='your_api_key_here'(Linux/macOS) orsetx OPENAI_API_KEY "your_api_key_here"(Windows).
pip install openai output
Collecting openai Downloading openai-1.x.x-py3-none-any.whl Installing collected packages: openai Successfully installed openai-1.x.x
Step by step
This example demonstrates chunking a long text, summarizing each chunk with gpt-4o, then combining those summaries into a final summary.
import os
from openai import OpenAI
# Initialize client
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
# Example long document (replace with your own text)
long_text = """\
Artificial intelligence (AI) is transforming industries by enabling machines to perform tasks that typically require human intelligence. This includes natural language processing, computer vision, and decision-making. However, processing very long documents can exceed model context limits, so chunking is essential. By splitting the document into smaller parts, we can summarize each part individually and then synthesize a final summary.
"""
# Function to chunk text into smaller pieces
def chunk_text(text, max_tokens=500):
words = text.split()
chunks = []
current_chunk = []
current_length = 0
for word in words:
current_chunk.append(word)
current_length += 1
if current_length >= max_tokens:
chunks.append(" ".join(current_chunk))
current_chunk = []
current_length = 0
if current_chunk:
chunks.append(" ".join(current_chunk))
return chunks
chunks = chunk_text(long_text, max_tokens=50) # small chunk for demo
# Summarize each chunk
chunk_summaries = []
for i, chunk in enumerate(chunks):
messages = [
{"role": "user", "content": f"Summarize this text:\n{chunk}"}
]
response = client.chat.completions.create(
model="gpt-4o",
messages=messages
)
summary = response.choices[0].message.content.strip()
chunk_summaries.append(summary)
print(f"Chunk {i+1} summary:\n{summary}\n")
# Combine chunk summaries into final summary
combined_text = "\n".join(chunk_summaries)
final_messages = [
{"role": "user", "content": f"Summarize the following summaries into a concise overall summary:\n{combined_text}"}
]
final_response = client.chat.completions.create(
model="gpt-4o",
messages=final_messages
)
final_summary = final_response.choices[0].message.content.strip()
print("Final summary:\n", final_summary) output
Chunk 1 summary: Artificial intelligence enables machines to perform tasks requiring human intelligence, such as language processing and decision-making. Final summary: Artificial intelligence transforms industries by enabling machines to perform complex tasks like natural language processing and decision-making, with chunking helping to summarize long documents effectively.
Common variations
You can adapt this approach by:
- Using async calls with
asyncioandawaitfor faster parallel chunk summarization. - Streaming partial summaries with
stream=Truefor real-time output. - Using other models like
gpt-4o-minifor cost savings orclaude-3-5-sonnet-20241022via Anthropic SDK. - Adjusting chunk size based on token limits of your chosen model.
import asyncio
import os
from openai import OpenAI
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
async def summarize_chunk(chunk):
response = await client.chat.completions.acreate(
model="gpt-4o-mini",
messages=[{"role": "user", "content": f"Summarize this text:\n{chunk}"}]
)
return response.choices[0].message.content.strip()
async def main():
chunks = ["Text chunk 1", "Text chunk 2"]
summaries = await asyncio.gather(*(summarize_chunk(c) for c in chunks))
combined = "\n".join(summaries)
final = await client.chat.completions.acreate(
model="gpt-4o-mini",
messages=[{"role": "user", "content": f"Summarize these summaries:\n{combined}"}]
)
print("Final summary:", final.choices[0].message.content.strip())
asyncio.run(main()) output
Final summary: [Concise summary of the combined chunks]
Troubleshooting
- If you get
context length exceedederrors, reduce chunk size or switch to a model with a larger context window. - If summaries are too generic, provide more detailed instructions in the prompt.
- For rate limit errors, add retry logic with exponential backoff.
- Ensure your API key is correctly set in
os.environ["OPENAI_API_KEY"].
Key Takeaways
- Split long documents into chunks within model token limits before summarizing.
- Summarize each chunk individually, then combine summaries for a final concise output.
- Use async calls or streaming for efficiency and responsiveness.
- Adjust chunk size and model choice based on your use case and cost constraints.
- Handle API errors by checking context limits, rate limits, and prompt clarity.