Pydantic model extraction pattern
Why this matters
Real-world data extraction requires both LLM reasoning and type safety: this pattern combines Claude's semantic understanding with Python's runtime validation, eliminating the parse-and-validate cycle most developers write manually.
Explanation
What it does: The Pydantic extraction pattern uses Claude's tool-calling capability to return structured data conforming to a Pydantic model schema, ensuring type safety and validation in a single API round-trip. Instead of prompting Claude to return JSON and then parsing it yourself, you define a Pydantic model, convert it to a JSON Schema tool definition, and let Claude populate it while respecting your constraints.
How it works: You define a Pydantic model with Field descriptions and constraints. The Anthropic SDK converts this to a tool schema in the request. Claude processes the user input, decides to invoke the tool, and returns structured arguments matching your model. You extract the tool input, instantiate your Pydantic model with it (which validates), and use the typed object directly. The validation happens both client-side (Pydantic) and implicit in Claude's understanding of the schema.
When to use it: Use this when extracting entities, parsing forms, or transforming unstructured text into structured records where you need guarantees that the output conforms to your application's data model. This is production-safe because failed validations raise exceptions you can handle, and Claude's tool use is more reliable than asking for raw JSON.
Request code
from anthropic import Anthropic
from pydantic import BaseModel, Field
import json
class Person(BaseModel):
name: str = Field(description="Full name of the person")
age: int = Field(description="Age in years, must be between 0 and 150")
email: str = Field(description="Email address")
occupation: str = Field(description="Job title or profession")
def extract_person_from_text(text: str) -> Person:
client = Anthropic()
person_schema = {
"name": "extract_person",
"description": "Extract person information from text",
"input_schema": {
"type": "object",
"properties": {
"name": {"type": "string", "description": "Full name of the person"},
"age": {"type": "integer", "description": "Age in years, must be between 0 and 150"},
"email": {"type": "string", "description": "Email address"},
"occupation": {"type": "string", "description": "Job title or profession"}
},
"required": ["name", "age", "email", "occupation"]
}
}
response = client.messages.create(
model="claude-opus-4-6",
max_tokens=1024,
tools=[person_schema],
messages=[
{
"role": "user",
"content": f"Extract person information from this text:\n\n{text}"
}
]
)
for block in response.content:
if block.type == "tool_use":
if block.name == "extract_person":
person_data = block.input
person = Person(**person_data)
return person
raise ValueError("Claude did not invoke the extract_person tool")
if __name__ == "__main__":
sample_text = "Meet Sarah Chen, a 34-year-old software engineer. She works at TechCorp and can be reached at sarah.chen@techcorp.com."
person = extract_person_from_text(sample_text)
print(f"Extracted: {person}")
print(f"Name: {person.name}, Age: {person.age}, Email: {person.email}") Authentication
Ensure your ANTHROPIC_API_KEY environment variable is set. The Anthropic SDK reads this at client instantiation time. Export it before running your script: export ANTHROPIC_API_KEY='sk-ant-...' or load it from a .env file using python-dotenv.
Response shape
| Field | Description |
|---|---|
content | List of content blocks, one of which has type='tool_use' |
tool_use_block.type | "tool_use" |
tool_use_block.name | "extract_person" (matches tool name) |
tool_use_block.input | Dictionary with keys matching Pydantic model fields (name, age, email, occupation) |
stop_reason | "tool_use" when tool is invoked |
Field guide
content Array of response blocks: always iterate through this to find type='tool_use'
tool_use_block.input The raw dictionary that becomes your Pydantic model's constructor argument: this is where validation happens
stop_reason If not 'tool_use', Claude chose not to invoke your tool (usually means the prompt was ambiguous or input didn't match schema intent)
Setup trap
Developers forget that tool schemas must match Pydantic model fields exactly: field names, types, and required status. If you add a Field(..., description=...) in Pydantic but don't include it in the input_schema properties, Claude won't populate it even if it infers the value. The mismatch is silent until you realize a field is always None.
Cost
Each tool-use call incurs standard Claude pricing (input and output tokens). A single extraction might use 150-300 tokens depending on text length and schema complexity. For batch extraction of 1000 records, cost is ~$0.45-0.90 at standard rates (April 2026). Pre-filtering text to remove noise and using concise Field descriptions reduces token spend by 10-20%.
Rate limits
Tool-use calls count as normal messages against your rate limit. If you extract 100 records per minute, monitor for 429 responses. Implement exponential backoff when hitting rate limits: Claude's models typically allow 2000-5000 requests per minute depending on your tier.
Common gotcha
Passing tool input directly to Pydantic without catching validation errors masks what Claude actually returned. If Claude's output violates your Pydantic constraints (e.g., age=999), the instantiation fails silently in try/except blocks developers add after debugging. Always log block.input before calling Person(**block.input) so you see what Claude generated versus what failed validation.
Error recovery
ValidationErrorValueError (tool not invoked)AttributeError on block.inputExperienced dev note
The real power of this pattern is that Claude's understanding of your schema constraints shapes its reasoning: it won't return nonsense that passes a regex but violates business logic (like an age of 999). The cost-benefit is highest when extraction rules are semantic rather than syntactic (e.g., 'infer job title from context' vs 'find text between two keywords'). For 95% accuracy, layer a human review step on the 5% of records where Pydantic raises ValidationError: this is cheaper than perfect prompting and documents ambiguous cases.
Check your understanding
If Claude returns age=150 but your Pydantic model has Field(ge=0, le=130), what happens and why? Would adding strict=True to the Pydantic field change the outcome?
Show answer hint
Pydantic validation fires when you instantiate the model with Claude's output: the strict parameter controls type coercion (e.g., "150" vs 150), not range validation. The ge/le constraints are checked regardless, so the instantiation fails and you catch the ValidationError. Claude doesn't know your local Pydantic constraints live client-side, so it's not avoiding the mistake: you're catching it.