How to intermediate · 3 min read

How to add structured output endpoint to FastAPI LLM app

Quick answer
Use FastAPI to define an endpoint that calls an LLM via the openai SDK, then parse and return the model's response as structured JSON. Leverage response_format or prompt engineering to ensure the LLM outputs JSON, and use pydantic models for response validation.

PREREQUISITES

  • Python 3.8+
  • OpenAI API key (free tier works)
  • pip install fastapi uvicorn openai pydantic

Setup

Install required packages and set your OpenAI API key as an environment variable.

  • Install FastAPI, Uvicorn, OpenAI SDK, and Pydantic:
bash
pip install fastapi uvicorn openai pydantic

Step by step

Create a FastAPI app with a POST endpoint that sends a prompt to the OpenAI gpt-4o-mini model, requesting JSON output. Use a pydantic model to validate and return structured data.

python
import os
from fastapi import FastAPI
from pydantic import BaseModel
from openai import OpenAI

app = FastAPI()
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

class StructuredOutput(BaseModel):
    name: str
    age: int
    email: str

@app.post("/structured-output", response_model=StructuredOutput)
async def structured_output(prompt: str):
    # Prompt engineering to get JSON output
    system_prompt = "You are a helpful assistant that outputs JSON with fields: name, age, email."
    messages = [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": prompt}
    ]
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=messages,
        temperature=0
    )
    content = response.choices[0].message.content

    # Parse JSON from LLM output
    import json
    try:
        data = json.loads(content)
    except json.JSONDecodeError:
        return {"name": "", "age": 0, "email": ""}  # fallback empty

    return StructuredOutput(**data)

Common variations

  • Use async calls with await if your SDK supports it.
  • Switch to other models like claude-3-5-haiku-20241022 by changing the client and prompt accordingly.
  • Implement streaming responses for real-time output.
python
from anthropic import Anthropic

client = Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])

@app.post("/structured-output-anthropic", response_model=StructuredOutput)
async def structured_output_anthropic(prompt: str):
    system = "You are a helpful assistant that outputs JSON with fields: name, age, email."
    message = client.messages.create(
        model="claude-3-5-haiku-20241022",
        max_tokens=512,
        system=system,
        messages=[{"role": "user", "content": prompt}]
    )
    import json
    try:
        data = json.loads(message.content)
    except json.JSONDecodeError:
        return {"name": "", "age": 0, "email": ""}
    return StructuredOutput(**data)

Troubleshooting

  • If JSON parsing fails, verify the prompt instructs the model to output strict JSON.
  • Use temperature=0 to reduce randomness and improve structured output consistency.
  • Check your environment variable OPENAI_API_KEY is set correctly.
  • For deployment, ensure uvicorn runs your FastAPI app properly.
bash
uvicorn main:app --reload
output
INFO:     Uvicorn running on http://127.0.0.1:8000 (Press CTRL+C to quit)

Key Takeaways

  • Use prompt engineering to get LLMs to output strict JSON for structured endpoints.
  • Validate and parse LLM JSON output with Pydantic models in FastAPI.
  • Set temperature to 0 for deterministic structured responses.
  • Use environment variables for API keys to keep credentials secure.
  • Test endpoints locally with Uvicorn before deployment.
Verified 2026-04 · gpt-4o-mini, claude-3-5-haiku-20241022
Verify ↗