How to use Claude for PDF analysis
Quick answer
Use
anthropic.Anthropic SDK to send extracted PDF text as messages to claude-3-5-sonnet-20241022. Extract PDF text with libraries like PyPDF2 or pdfplumber, then pass the content in the messages parameter for analysis.PREREQUISITES
Python 3.8+Anthropic API keypip install anthropic>=0.20pip install PyPDF2 or pdfplumber
Setup
Install the required Python packages for PDF extraction and Anthropic API access. Set your Anthropic API key as an environment variable.
pip install anthropic PyPDF2 Step by step
Extract text from a PDF file using PyPDF2 and send it to Claude for analysis with the Anthropic SDK.
import os
import anthropic
from PyPDF2 import PdfReader
# Initialize Anthropic client
client = anthropic.Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])
# Function to extract text from PDF
def extract_pdf_text(pdf_path):
reader = PdfReader(pdf_path)
text = []
for page in reader.pages:
text.append(page.extract_text())
return "\n".join(text)
# Extract text from your PDF
pdf_text = extract_pdf_text("sample.pdf")
# Prepare prompt for Claude
system_prompt = "You are a helpful assistant that analyzes PDF documents."
user_prompt = f"Analyze the following PDF content:\n{pdf_text[:3000]}" # Limit to first 3000 chars
# Send request to Claude
response = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=1024,
system=system_prompt,
messages=[{"role": "user", "content": user_prompt}]
)
print(response.content[0].text) output
Claude's analysis of the PDF content...
Common variations
- Use
pdfplumberfor more accurate PDF text extraction. - Send smaller chunks of PDF text in multiple messages if the document is large.
- Adjust
max_tokensto control response length. - Use async calls with
anthropic.Anthropicif needed.
Troubleshooting
- If you get truncated or incomplete responses, reduce the input text size or increase
max_tokens. - If PDF text extraction returns None or empty strings, try switching from
PyPDF2topdfplumber. - Ensure your
ANTHROPIC_API_KEYenvironment variable is set correctly.
Key Takeaways
- Extract PDF text first using libraries like PyPDF2 before sending to Claude.
- Use the Anthropic SDK with the system prompt to guide Claude's PDF analysis.
- Limit input size to avoid token limits and control response length with max_tokens.
- Switch PDF extraction libraries if text extraction quality is poor.
- Set your API key securely via environment variables to authenticate Anthropic requests.