API Intermediate medium · 6 min

Pydantic model extraction pattern

What you will learn

Use Claude to extract structured data into Pydantic models by leveraging the Messages API with tool use and validation.

Why this matters

Real-world data extraction requires both LLM reasoning and type safety: this pattern combines Claude's semantic understanding with Python's runtime validation, eliminating the parse-and-validate cycle most developers write manually.

Skip if: If you're extracting data with a single, fixed regex pattern or parsing deterministic formats (CSV, XML), Pydantic alone suffices. Skip Claude if validation logic is simpler than the API latency cost. Use simpler prompting without tool use if you don't need guaranteed structured output.

Explanation

What it does: The Pydantic extraction pattern uses Claude's tool-calling capability to return structured data conforming to a Pydantic model schema, ensuring type safety and validation in a single API round-trip. Instead of prompting Claude to return JSON and then parsing it yourself, you define a Pydantic model, convert it to a JSON Schema tool definition, and let Claude populate it while respecting your constraints.

How it works: You define a Pydantic model with Field descriptions and constraints. The Anthropic SDK converts this to a tool schema in the request. Claude processes the user input, decides to invoke the tool, and returns structured arguments matching your model. You extract the tool input, instantiate your Pydantic model with it (which validates), and use the typed object directly. The validation happens both client-side (Pydantic) and implicit in Claude's understanding of the schema.

When to use it: Use this when extracting entities, parsing forms, or transforming unstructured text into structured records where you need guarantees that the output conforms to your application's data model. This is production-safe because failed validations raise exceptions you can handle, and Claude's tool use is more reliable than asking for raw JSON.

Request code

python

from anthropic import Anthropic
from pydantic import BaseModel, Field
import json

class Person(BaseModel):
    name: str = Field(description="Full name of the person")
    age: int = Field(description="Age in years, must be between 0 and 150")
    email: str = Field(description="Email address")
    occupation: str = Field(description="Job title or profession")

def extract_person_from_text(text: str) -> Person:
    client = Anthropic()
    
    person_schema = {
        "name": "extract_person",
        "description": "Extract person information from text",
        "input_schema": {
            "type": "object",
            "properties": {
                "name": {"type": "string", "description": "Full name of the person"},
                "age": {"type": "integer", "description": "Age in years, must be between 0 and 150"},
                "email": {"type": "string", "description": "Email address"},
                "occupation": {"type": "string", "description": "Job title or profession"}
            },
            "required": ["name", "age", "email", "occupation"]
        }
    }
    
    response = client.messages.create(
        model="claude-opus-4-6",
        max_tokens=1024,
        tools=[person_schema],
        messages=[
            {
                "role": "user",
                "content": f"Extract person information from this text:\n\n{text}"
            }
        ]
    )
    
    for block in response.content:
        if block.type == "tool_use":
            if block.name == "extract_person":
                person_data = block.input
                person = Person(**person_data)
                return person
    
    raise ValueError("Claude did not invoke the extract_person tool")

if __name__ == "__main__":
    sample_text = "Meet Sarah Chen, a 34-year-old software engineer. She works at TechCorp and can be reached at sarah.chen@techcorp.com."
    person = extract_person_from_text(sample_text)
    print(f"Extracted: {person}")
    print(f"Name: {person.name}, Age: {person.age}, Email: {person.email}")

Authentication

Ensure your ANTHROPIC_API_KEY environment variable is set. The Anthropic SDK reads this at client instantiation time. Export it before running your script: export ANTHROPIC_API_KEY='sk-ant-...' or load it from a .env file using python-dotenv.

Response shape

Field	Description
`content`	List of content blocks, one of which has type='tool_use'
`tool_use_block.type`	"tool_use"
`tool_use_block.name`	"extract_person" (matches tool name)
`tool_use_block.input`	Dictionary with keys matching Pydantic model fields (name, age, email, occupation)
`stop_reason`	"tool_use" when tool is invoked

Field guide

content

Array of response blocks: always iterate through this to find type='tool_use'

tool_use_block.input

The raw dictionary that becomes your Pydantic model's constructor argument: this is where validation happens

stop_reason

If not 'tool_use', Claude chose not to invoke your tool (usually means the prompt was ambiguous or input didn't match schema intent)

Setup trap

Developers forget that tool schemas must match Pydantic model fields exactly: field names, types, and required status. If you add a Field(..., description=...) in Pydantic but don't include it in the input_schema properties, Claude won't populate it even if it infers the value. The mismatch is silent until you realize a field is always None.

Cost

Each tool-use call incurs standard Claude pricing (input and output tokens). A single extraction might use 150-300 tokens depending on text length and schema complexity. For batch extraction of 1000 records, cost is ~$0.45-0.90 at standard rates (April 2026). Pre-filtering text to remove noise and using concise Field descriptions reduces token spend by 10-20%.

Rate limits

Tool-use calls count as normal messages against your rate limit. If you extract 100 records per minute, monitor for 429 responses. Implement exponential backoff when hitting rate limits: Claude's models typically allow 2000-5000 requests per minute depending on your tier.

Common gotcha

Passing tool input directly to Pydantic without catching validation errors masks what Claude actually returned. If Claude's output violates your Pydantic constraints (e.g., age=999), the instantiation fails silently in try/except blocks developers add after debugging. Always log block.input before calling Person(**block.input) so you see what Claude generated versus what failed validation.

Error recovery

ValidationError

Pydantic raised this because Claude's output didn't match constraints (e.g., age outside 0-150 range). Catch it and log block.input, then either retry with a clearer schema description or use a follow-up message asking Claude to correct the field.

ValueError (tool not invoked)

Claude ignored your tool and responded with text instead. This means the input was ambiguous or the tool wasn't relevant. Rewrite the user message to be more direct: 'Extract using this format' instead of 'Can you extract?'

AttributeError on block.input

You're iterating content blocks but not checking type=='tool_use' first. Non-tool blocks don't have an input attribute. Always filter: [b for b in response.content if b.type == 'tool_use']

Experienced dev note

The real power of this pattern is that Claude's understanding of your schema constraints shapes its reasoning: it won't return nonsense that passes a regex but violates business logic (like an age of 999). The cost-benefit is highest when extraction rules are semantic rather than syntactic (e.g., 'infer job title from context' vs 'find text between two keywords'). For 95% accuracy, layer a human review step on the 5% of records where Pydantic raises ValidationError: this is cheaper than perfect prompting and documents ambiguous cases.

Check your understanding

If Claude returns age=150 but your Pydantic model has Field(ge=0, le=130), what happens and why? Would adding strict=True to the Pydantic field change the outcome?

Show answer hint

Pydantic validation fires when you instantiate the model with Claude's output: the strict parameter controls type coercion (e.g., "150" vs 150), not range validation. The ge/le constraints are checked regardless, so the instantiation fails and you catch the ValidationError. Claude doesn't know your local Pydantic constraints live client-side, so it's not avoiding the mistake: you're catching it.

VERSION anthropic==0.94.x uses the Messages API with tool_use content blocks. Older versions (0.7.x) used different tool formats. Always pin your anthropic version: pip install 'anthropic>=0.94,<0.95' to ensure tool schemas work as documented.

Community Notes

No notes yetBe the first to share a version-specific fix or tip.