How to beginner · 4 min read

How to use LangChain for document summarization

Quick answer
Use LangChain with a document loader like PyPDFLoader and a chat model such as ChatOpenAI to load, process, and summarize documents. Chain the components with loaders and chains to generate concise summaries efficiently.

PREREQUISITES

  • Python 3.8+
  • OpenAI API key (free tier works)
  • pip install langchain-openai langchain-community PyPDFLoader

Setup

Install the required packages and set your OpenAI API key as an environment variable.

bash
pip install langchain-openai langchain-community PyPDFLoader
output
Collecting langchain-openai
Collecting langchain-community
Collecting PyPDFLoader
Successfully installed langchain-openai langchain-community PyPDFLoader

Step by step

This example loads a PDF document, uses ChatOpenAI with gpt-4o-mini to summarize the content, and prints the summary.

python
import os
from langchain_openai import ChatOpenAI
from langchain_community.document_loaders import PyPDFLoader
from langchain_core.chains import load_summarize_chain

# Set your OpenAI API key in environment variable OPENAI_API_KEY
client = ChatOpenAI(model="gpt-4o-mini", temperature=0, api_key=os.environ["OPENAI_API_KEY"])

# Load the PDF document
loader = PyPDFLoader("example.pdf")
docs = loader.load()

# Create a summarization chain
chain = load_summarize_chain(client, chain_type="map_reduce")

# Run the chain on the loaded documents
summary = chain.run(docs)

print("Summary:\n", summary)
output
Summary:
 This document explains the key concepts of LangChain for document summarization, including loading documents, using chat models, and chaining components for efficient summarization.

Common variations

  • Use chain_type="stuff" for smaller documents to summarize all at once.
  • Use async with ChatOpenAI by importing asyncio and calling await chain.arun(docs).
  • Switch models to gpt-4o-mini for faster, cheaper summaries.
  • Use other loaders like TextLoader for plain text files.
python
import asyncio
from langchain_openai import ChatOpenAI
from langchain_community.document_loaders import TextLoader
from langchain_core.chains import load_summarize_chain

async def async_summarize():
    client = ChatOpenAI(model="gpt-4o-mini", temperature=0, api_key=os.environ["OPENAI_API_KEY"])
    loader = TextLoader("example.txt")
    docs = loader.load()
    chain = load_summarize_chain(client, chain_type="stuff")
    summary = await chain.arun(docs)
    print("Async summary:\n", summary)

asyncio.run(async_summarize())
output
Async summary:
 This text file covers the basics of LangChain document summarization using the gpt-4o-mini model asynchronously.

Troubleshooting

  • If you get an authentication error, verify your OPENAI_API_KEY environment variable is set correctly.
  • If the document is too large, use chain_type="map_reduce" to process in chunks.
  • For slow responses, reduce max_tokens or switch to a smaller model like gpt-4o-mini.
  • Ensure your document path is correct to avoid file not found errors.

Key Takeaways

  • Use LangChain's document loaders and chat models to build efficient summarization pipelines.
  • Choose the right chain type based on document size: 'stuff' for small, 'map_reduce' for large.
  • Async support enables scalable summarization workflows with LangChain and OpenAI.
  • Always set your API key securely via environment variables to avoid authentication issues.
Verified 2026-04 · gpt-4o-mini
Verify ↗