Code Intermediate medium · 6 min

with_structured_output(): the modern pattern

What you will learn

Use <code>with_structured_output()</code> to bind a Pydantic schema directly to an LLM and get validated Python objects back instead of parsing JSON strings yourself.

Why this matters

Parsing LLM JSON responses manually is error-prone and wastes tokens on retry logic. <code>with_structured_output()</code> handles validation, retry, and type coercion automatically: cutting boilerplate by 70% and eliminating silent failures.

Skip if: Don't use this if you need the raw LLM response for debugging, audit trails, or when the LLM provider doesn't support function calling (older models or smaller open-source LLMs without native tool-calling support).

Explanation

What it is: with_structured_output() is a method on LangChain LLM objects that attaches a Pydantic schema as a constraint, forcing the LLM to return data matching that schema. The LLM handles serialization; you get back a Python object.

How it works mechanically: When you call llm.with_structured_output(MySchema), LangChain converts your Pydantic model into a JSON schema, sends it to the LLM's function-calling API (like OpenAI's tools parameter), and the LLM responds with a tool call matching your schema. LangChain then deserializes that tool call back into a Pydantic instance. If the LLM produces invalid data, the provider-level validation catches it before it reaches your code.

When to use it: Use this whenever you need structured data from an LLM: entity extraction, classification, data transformation, report generation. It's the default pattern in LangChain 1.2.x for any deterministic output shape.

Analogy

It's like declaring a function signature and then calling that function through the LLM. The LLM fills in the values; the type system enforces the shape. No more guessing what the JSON will look like.

Code

python

from pydantic import BaseModel, Field
from langchain_openai import ChatOpenAI

class Person(BaseModel):
    name: str = Field(description="Full name of the person")
    age: int = Field(description="Age in years")
    occupation: str = Field(description="What they do for work")

llm = ChatOpenAI(model="gpt-4o", temperature=0)
structured_llm = llm.with_structured_output(Person)

result = structured_llm.invoke(
    "Extract person info: Alice is 32 years old and works as a software engineer."
)

print(f"Type: {type(result)}")
print(f"Name: {result.name}")
print(f"Age: {result.age}")
print(f"Occupation: {result.occupation}")
print(f"\nFull object: {result}")

Output

Type: <class '__main__.Person'>
Name: Alice
Age: 32
Occupation: software engineer

Full object: name='Alice' age=32 occupation='software engineer'

What just happened?

We created a Pydantic schema <code>Person</code>, wrapped the LLM with <code>with_structured_output(Person)</code>, and invoked it with unstructured text. The LLM received the schema as a tool definition, structured its response to match it, and LangChain deserialized the response into an actual <code>Person</code> instance: not a dict, not a string, an instance. The <code>type()</code> call proves it's the real class, and attribute access works directly.

Common gotcha

Developers often assume with_structured_output() works with every LLM provider: it doesn't. It requires function-calling support. Ollama, some Anthropic modes, and older models will throw NotImplementedError: Model does not support tool calling silently if you don't check provider capabilities first. Always verify your LLM's supports_function_calling property in production.

Error recovery

NotImplementedError: Model does not support tool calling

Your LLM provider doesn't have function-calling support. Switch to a model that does (gpt-4o, Claude 3.5, Gemini 2.0) or fall back to <code>llm | JsonOutputParser()</code> with manual parsing.

ValidationError from pydantic

The LLM returned data that doesn't match your schema (e.g., age as a string instead of int). Make your Field descriptions more explicit: add examples in the description: <code>Field(description='Age in years, e.g., 32')</code>. Or use <code>mode='json'</code> parameter if available.

AttributeError: 'dict' object has no attribute 'name'

You're getting a dict, not a Pydantic instance. This means <code>with_structured_output()</code> fell back to JSON parsing. Check that your schema is valid Pydantic (all fields have types) and your LLM supports the provider's function-calling.

TypeError: expected Person, got str

You're passing the schema as a string or type instead of an instance. Use <code>with_structured_output(Person)</code>, not <code>with_structured_output('Person')</code> or <code>with_structured_output(Person())</code>.

Experienced dev note

The mental shift here: stop thinking of the LLM as a text generator that you parse. Think of it as a function executor. with_structured_output() makes that real. This also means your schema design directly impacts LLM cost: more fields = more tokens in the system prompt. Use Field(exclude=True) for internal bookkeeping fields the LLM shouldn't see. Also, the LLM's temperature should be 0 or very low when using structured output; high temperature increases validation failures and retries.

Check your understanding

You have a schema Event with fields title, start_time (datetime), and description. After calling structured_llm.invoke() with input text, your code crashes with a validation error when the LLM returns a malformed datetime. Should you (a) switch to string fields, (b) add a custom validator to parse the string, (c) improve the Field description to show the exact format expected, or (d) increase temperature? Why?

Show answer hint

The answer is (c). The LLM isn't failing because the schema is wrong: it's failing because your schema didn't tell the LLM the format you wanted. The LLM will try to conform if you're explicit. Temperature (d) makes it worse. Validators (b) are for post-processing, not LLM guidance. And strings (a) defeat the purpose. The fix: add <code>Field(description="ISO 8601 format, e.g., 2026-04-15T14:30:00Z")</code>.

VERSION In langchain < 1.0.0, structured output required manual JSON schema building and OutputFixingParser. The with_structured_output() method was added in langchain 0.2.15 (October 2024) and is the standard in 1.2.x. Do not use deprecated create_extraction_chain(): it was removed in 1.0.0.

Once you have structured single objects, learn <code>with_structured_output(list[YourSchema])</code> to extract arrays of structured records in a single LLM call, which is essential for batch entity extraction and list-based parsing tasks.

Community Notes

No notes yetBe the first to share a version-specific fix or tip.