API Intermediate medium · 6 min

Nested object extraction

What you will learn

Use Claude's structured output API with nested schemas to extract deeply hierarchical data from unstructured text in a single API call.

Why this matters

Real-world data is rarely flat: invoices have line items, documents have sections with subsections, APIs return nested responses. Extracting nested structures with Claude eliminates the need for multiple API calls or fragile regex parsing, reducing latency and cost while improving accuracy.

Skip if: Don't use structured output if your extraction target is simple and flat (single-level fields). Standard text completion is faster and cheaper. Also avoid structured output if you need streaming responses: Claude's structured mode returns the full object at once.

Explanation

What it does: Claude's structured output mode accepts a JSON schema that defines nested objects and validates the model's response against that schema. When you provide a schema with nested properties, Claude returns a complete, valid JSON object matching your structure: no parsing needed.

How it works: You define a Pydantic model or raw JSON schema with nested fields (using properties and $defs). The API processes your prompt and returns a parsed response where the entire object hierarchy is guaranteed valid. The Claude model understands the nesting depth and extracts relationships between parent and child objects correctly, without hallucinating extra or missing levels.

When to use it: Use this when extracting structured data from unstructured documents (contracts, medical records, customer feedback), converting prose into databases, or enriching messy input. The nested pattern shines when your extraction has 2+ levels of hierarchy and accuracy matters more than speed.

Request code

python

from anthropic import Anthropic
import json
from typing import Optional

client = Anthropic()

text_to_extract = """
Invoice #2026-001 from Acme Corp dated April 15, 2026.
Line items:
- 5 units of Widget A at $10 each, shipped to New York
- 2 units of Widget B at $25 each, expedited to California
Notes: Customer is a returning client with 10% loyalty discount applied.
"""

response = client.messages.create(
    model="claude-opus-4-6",
    max_tokens=1024,
    messages=[
        {
            "role": "user",
            "content": f"Extract the invoice data from this text and return it as JSON:\n\n{text_to_extract}"
        }
    ],
    tools=[
        {
            "name": "extract_invoice",
            "description": "Extracts structured invoice data including line items and metadata",
            "input_schema": {
                "type": "object",
                "properties": {
                    "invoice_number": {
                        "type": "string",
                        "description": "Invoice ID"
                    },
                    "vendor": {
                        "type": "string",
                        "description": "Vendor name"
                    },
                    "date": {
                        "type": "string",
                        "description": "Invoice date"
                    },
                    "line_items": {
                        "type": "array",
                        "items": {
                            "type": "object",
                            "properties": {
                                "product_name": {
                                    "type": "string"
                                },
                                "quantity": {
                                    "type": "integer"
                                },
                                "unit_price": {
                                    "type": "number"
                                },
                                "destination": {
                                    "type": "string"
                                },
                                "shipping_method": {
                                    "type": "string",
                                    "enum": ["standard", "expedited"]
                                }
                            },
                            "required": ["product_name", "quantity", "unit_price"]
                        }
                    },
                    "metadata": {
                        "type": "object",
                        "properties": {
                            "is_returning_customer": {
                                "type": "boolean"
                            },
                            "discount_applied": {
                                "type": "number",
                                "description": "Discount percentage"
                            },
                            "notes": {
                                "type": "string",
                                "description": "Additional notes"
                            }
                        }
                    }
                },
                "required": ["invoice_number", "vendor", "date", "line_items"]
            }
        }
    ]
)

if response.content[0].type == "tool_use":
    extracted_data = response.content[0].input
    print("Extracted Invoice:")
    print(json.dumps(extracted_data, indent=2))
else:
    print("No tool call in response")
    print(response.content[0].text)

Authentication

Set your Anthropic API key as an environment variable before running: export ANTHROPIC_API_KEY='sk-ant-...'. The Anthropic client reads this automatically on instantiation.

Response shape

Field	Description
`content`	list of content blocks
`content[0].type`	tool_use (when schema validation succeeds)
`content[0].input`	parsed nested object matching your schema
`content[0].input.invoice_number`	string
`content[0].input.line_items`	array of objects with product_name, quantity, unit_price, destination, shipping_method
`content[0].input.metadata`	object with is_returning_customer, discount_applied, notes
`stop_reason`	tool_use (indicates successful schema match)

Field guide

content[0].input

The actual extracted data: this is what you care about. It's already parsed as a Python dict, not a string.

stop_reason

Will be 'tool_use' on success. If it's 'end_turn' or 'max_tokens', the model didn't call your tool, meaning extraction failed or the schema was misunderstood.

line_items

Array of nested objects. The model correctly groups related fields (quantity with unit_price, not mixing across items). This is where the nesting magic happens.

metadata

Optional nested object many developers miss: it captures context about the extraction (customer status, discounts) that enriches the primary data without cluttering the main schema.

Setup trap

The tools parameter is required even though this looks like structured output, not tool use. Without declaring the tool, Claude ignores your schema. Also, the nested schema must include required fields at each level: omitting them leaves optional fields that the model may skip, breaking your downstream parsing.

Cost

Nested extraction costs the same as a standard API call (input + output tokens). However, extracting multiple flat objects separately costs more: one nested call is cheaper than N flat calls. For an invoice with 10 line items, one nested extraction (~500 output tokens) beats 10 separate calls (~1,000 tokens minimum).

Rate limits

Not a concern for this specific operation: extraction doesn't hit rate limits harder than any other call. However, if you're extracting from thousands of documents in parallel, batch them: use the Batch API for 50% cost savings.

Common gotcha

Developers often check response.content[0].text expecting text, but tool_use responses don't have a text field: they have an input field containing the parsed object. Checking the wrong field returns None or crashes.

Error recovery

KeyError on content[0].input

The tool wasn't called. Check that response.content[0].type == 'tool_use'. If it's 'end_turn', your schema was too strict or your prompt was unclear. Relax constraints or clarify instructions.

ValidationError in nested properties

Claude returned a partial object missing required nested fields. Add <code>"required": [...]</code> at each nesting level. Also verify your enum values match what the model can infer from the text.

TypeError: object is not subscriptable

You're trying to access <code>response.content[0]</code> when content is empty. This means the API call succeeded but the model didn't generate any content block. Check that max_tokens is high enough and the tool definition is valid JSON.

Experienced dev note

Nested extraction beats prompt engineering. Instead of writing elaborate instructions to handle complex hierarchies, let the schema enforce structure. The model is smarter with a schema than with prose instructions. Also: always make nested fields optional unless absolutely required: it gives Claude wiggle room when the input text is ambiguous, reducing hallucinations. Finally, use enum for categorical nested fields (like shipping_method above). It cuts hallucination by 90% and costs the same.

Check your understanding

You're extracting customer orders where each order contains nested items, and some items have nested discounts applied at the item level. Your current schema requires every discount object to have a reason_code. The API returns a valid response, but some items are missing discount objects entirely. Is this a schema problem, a model problem, or neither: and what's your next move?

Show answer hint

It's neither. The schema allows the discount object to be omitted (not in required[]), so the model correctly returned items without discounts. This is working as designed. Your next move: decide if discount is always expected (add to required) or if null/missing is acceptable. The schema gave you the right behavior: the question is whether you designed it to match your business logic.

VERSION This pattern uses anthropic 0.94.x SDK. The tool-use pattern for structured output is stable and backward compatible. If upgrading from 0.92.x or earlier, no breaking changes affect nested extraction: same API shape.

Community Notes

No notes yetBe the first to share a version-specific fix or tip.