How to beginner · 3 min read

How to use Claude for PDF analysis

Quick answer
Use anthropic.Anthropic SDK to send extracted PDF text as messages to claude-3-5-sonnet-20241022. Extract PDF text with libraries like PyPDF2 or pdfplumber, then pass the content in the messages parameter for analysis.

PREREQUISITES

  • Python 3.8+
  • Anthropic API key
  • pip install anthropic>=0.20
  • pip install PyPDF2 or pdfplumber

Setup

Install the required Python packages for PDF extraction and Anthropic API access. Set your Anthropic API key as an environment variable.

bash
pip install anthropic PyPDF2

Step by step

Extract text from a PDF file using PyPDF2 and send it to Claude for analysis with the Anthropic SDK.

python
import os
import anthropic
from PyPDF2 import PdfReader

# Initialize Anthropic client
client = anthropic.Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])

# Function to extract text from PDF

def extract_pdf_text(pdf_path):
    reader = PdfReader(pdf_path)
    text = []
    for page in reader.pages:
        text.append(page.extract_text())
    return "\n".join(text)

# Extract text from your PDF
pdf_text = extract_pdf_text("sample.pdf")

# Prepare prompt for Claude
system_prompt = "You are a helpful assistant that analyzes PDF documents."
user_prompt = f"Analyze the following PDF content:\n{pdf_text[:3000]}"  # Limit to first 3000 chars

# Send request to Claude
response = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=1024,
    system=system_prompt,
    messages=[{"role": "user", "content": user_prompt}]
)

print(response.content[0].text)
output
Claude's analysis of the PDF content...

Common variations

  • Use pdfplumber for more accurate PDF text extraction.
  • Send smaller chunks of PDF text in multiple messages if the document is large.
  • Adjust max_tokens to control response length.
  • Use async calls with anthropic.Anthropic if needed.

Troubleshooting

  • If you get truncated or incomplete responses, reduce the input text size or increase max_tokens.
  • If PDF text extraction returns None or empty strings, try switching from PyPDF2 to pdfplumber.
  • Ensure your ANTHROPIC_API_KEY environment variable is set correctly.

Key Takeaways

  • Extract PDF text first using libraries like PyPDF2 before sending to Claude.
  • Use the Anthropic SDK with the system prompt to guide Claude's PDF analysis.
  • Limit input size to avoid token limits and control response length with max_tokens.
  • Switch PDF extraction libraries if text extraction quality is poor.
  • Set your API key securely via environment variables to authenticate Anthropic requests.
Verified 2026-04 · claude-3-5-sonnet-20241022
Verify ↗