How to beginner · 3 min read

How to load PDF in LangChain

Q: How to load PDF in LangChain

Use the PyPDFLoader from langchain_community.document_loaders to load PDF files in LangChain. Instantiate it with the PDF file path, then call load() to get the document content as LangChain Document objects.

Quick answer

Use the PyPDFLoader from langchain_community.document_loaders to load PDF files in LangChain. Instantiate it with the PDF file path, then call load() to get the document content as LangChain Document objects.

PREREQUISITES

Python 3.8+
pip install langchain langchain_community
A PDF file to load

Setup

Install the required packages with pip and prepare your environment.

bash

pip install langchain langchain_community

Step by step

Use PyPDFLoader to load a PDF file and extract its text content as LangChain documents.

python

from langchain_community.document_loaders import PyPDFLoader

# Path to your PDF file
pdf_path = "example.pdf"

# Initialize the loader
loader = PyPDFLoader(pdf_path)

# Load the documents
documents = loader.load()

# Print the first page content
print(documents[0].page_content)

output

This is the text content of the first page of example.pdf...

Common variations

Use load_and_split() to split the PDF into smaller chunks for better processing.
Use other loaders like UnstructuredPDFLoader for more complex PDFs.
Combine with LangChain text splitting and embeddings for downstream tasks.

python

from langchain_community.document_loaders import PyPDFLoader

loader = PyPDFLoader("example.pdf")
docs = loader.load_and_split()
print(f"Loaded {len(docs)} chunks from PDF")

output

Loaded 10 chunks from PDF

Troubleshooting

If you get errors loading the PDF, verify the file path and that the PDF is not corrupted.
For scanned PDFs, text extraction may fail; consider OCR preprocessing.
Ensure langchain_community is up to date for latest loader fixes.

✅

Key Takeaways

Use PyPDFLoader from langchain_community.document_loaders to load PDFs easily.
Call load() for full document or load_and_split() for chunked text.
Check PDF integrity and consider OCR for scanned documents before loading.
Keep langchain_community updated for best compatibility and features.

Verified 2026-04

Verify ↗