How to intermediate · 3 min read

How to add structured output endpoint to FastAPI LLM app

Quick answer

Use FastAPI to define an endpoint that calls an LLM via the openai SDK, then parse and return the model's response as structured JSON. Leverage response_format or prompt engineering to ensure the LLM outputs JSON, and use pydantic models for response validation.

PREREQUISITES

Python 3.8+
OpenAI API key (free tier works)
pip install fastapi uvicorn openai pydantic

Setup

Install required packages and set your OpenAI API key as an environment variable.

Install FastAPI, Uvicorn, OpenAI SDK, and Pydantic:

bash

pip install fastapi uvicorn openai pydantic

Step by step

Create a FastAPI app with a POST endpoint that sends a prompt to the OpenAI gpt-4o-mini model, requesting JSON output. Use a pydantic model to validate and return structured data.

python

import os
from fastapi import FastAPI
from pydantic import BaseModel
from openai import OpenAI

app = FastAPI()
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

class StructuredOutput(BaseModel):
    name: str
    age: int
    email: str

@app.post("/structured-output", response_model=StructuredOutput)
async def structured_output(prompt: str):
    # Prompt engineering to get JSON output
    system_prompt = "You are a helpful assistant that outputs JSON with fields: name, age, email."
    messages = [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": prompt}
    ]
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=messages,
        temperature=0
    )
    content = response.choices[0].message.content

    # Parse JSON from LLM output
    import json
    try:
        data = json.loads(content)
    except json.JSONDecodeError:
        return {"name": "", "age": 0, "email": ""}  # fallback empty

    return StructuredOutput(**data)

Common variations

Use async calls with await if your SDK supports it.
Switch to other models like claude-3-5-haiku-20241022 by changing the client and prompt accordingly.
Implement streaming responses for real-time output.

python

from anthropic import Anthropic

client = Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])

@app.post("/structured-output-anthropic", response_model=StructuredOutput)
async def structured_output_anthropic(prompt: str):
    system = "You are a helpful assistant that outputs JSON with fields: name, age, email."
    message = client.messages.create(
        model="claude-3-5-haiku-20241022",
        max_tokens=512,
        system=system,
        messages=[{"role": "user", "content": prompt}]
    )
    import json
    try:
        data = json.loads(message.content)
    except json.JSONDecodeError:
        return {"name": "", "age": 0, "email": ""}
    return StructuredOutput(**data)

Troubleshooting

If JSON parsing fails, verify the prompt instructs the model to output strict JSON.
Use temperature=0 to reduce randomness and improve structured output consistency.
Check your environment variable OPENAI_API_KEY is set correctly.
For deployment, ensure uvicorn runs your FastAPI app properly.

bash

uvicorn main:app --reload

output

INFO:     Uvicorn running on http://127.0.0.1:8000 (Press CTRL+C to quit)

✅

Key Takeaways

Use prompt engineering to get LLMs to output strict JSON for structured endpoints.
Validate and parse LLM JSON output with Pydantic models in FastAPI.
Set temperature to 0 for deterministic structured responses.
Use environment variables for API keys to keep credentials secure.
Test endpoints locally with Uvicorn before deployment.

Verified 2026-04 · gpt-4o-mini, claude-3-5-haiku-20241022

Verify ↗