How to use Claude for document analysis
Quick answer
Use the
Anthropic Python SDK to send document text to a Claude model like claude-3-5-sonnet-20241022 for analysis. Extract text from documents, then pass it as input messages to the model to get summaries, insights, or structured data.PREREQUISITES
Python 3.8+Anthropic API keypip install anthropic>=0.20
Setup
Install the anthropic Python SDK and set your API key as an environment variable for secure access.
pip install anthropic>=0.20 Step by step
Extract text from your document (e.g., PDF or TXT), then send it to Claude for analysis using the messages.create method. Below is a complete example that summarizes a document's content.
import os
import anthropic
client = anthropic.Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])
document_text = """Artificial intelligence (AI) is transforming industries by enabling machines to learn from data and perform tasks that typically require human intelligence. Document analysis with AI can extract insights, summarize content, and classify information efficiently."""
prompt = f"""You are a helpful assistant. Please provide a concise summary of the following document text:\n\n{document_text}"""
response = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=300,
system="You are a helpful assistant.",
messages=[{"role": "user", "content": prompt}]
)
print("Summary:\n", response.content[0].text) output
Summary: Artificial intelligence (AI) enables machines to learn from data and perform tasks requiring human intelligence. It transforms industries by extracting insights, summarizing content, and classifying information efficiently.
Common variations
- Use different Claude models like
claude-3-opus-20240229for faster responses. - Process documents asynchronously using Python
asynciowith the Anthropic SDK. - Combine document loaders (e.g.,
PyPDFLoaderfrom LangChain) to automate text extraction before analysis.
import asyncio
import anthropic
async def analyze_document_async(text):
client = anthropic.Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])
prompt = f"Summarize this document:\n\n{text}"
response = await client.messages.acreate(
model="claude-3-opus-20240229",
max_tokens=300,
system="You are a helpful assistant.",
messages=[{"role": "user", "content": prompt}]
)
print("Async summary:\n", response.content[0].text)
asyncio.run(analyze_document_async("Sample document text here.")) output
Async summary: Sample document text here summarized concisely.
Troubleshooting
- If you get authentication errors, verify your
ANTHROPIC_API_KEYenvironment variable is set correctly. - For rate limit errors, implement exponential backoff retries.
- If the output is incomplete, increase
max_tokensor split large documents into smaller chunks.
Key Takeaways
- Use the Anthropic SDK with
claude-3-5-sonnet-20241022for effective document analysis. - Extract and preprocess document text before sending it to Claude for best results.
- Adjust
max_tokensand chunk large documents to avoid truncated outputs.