How to beginner · 3 min read

How to set up FastAPI for LLM apps

Quick answer
Use FastAPI to build a lightweight web server and integrate it with an LLM API like OpenAI by calling client.chat.completions.create() inside an endpoint. This setup enables scalable, asynchronous LLM-powered applications with minimal code.

PREREQUISITES

  • Python 3.8+
  • OpenAI API key (free tier works)
  • pip install fastapi uvicorn openai>=1.0

Setup

Install FastAPI for the web framework, uvicorn as the ASGI server, and the openai Python SDK for LLM API calls. Set your OpenAI API key as an environment variable.

bash
pip install fastapi uvicorn openai>=1.0

# Set your API key in your shell environment
export OPENAI_API_KEY=os.environ["OPENAI_API_KEY"]

Step by step

Create a FastAPI app with a POST endpoint that accepts user input, calls the OpenAI gpt-4.1 model, and returns the generated text. This example uses synchronous code for simplicity.

python
import os
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from openai import OpenAI

app = FastAPI()
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

class PromptRequest(BaseModel):
    prompt: str

@app.post("/generate")
def generate_text(request: PromptRequest):
    try:
        response = client.chat.completions.create(
            model="gpt-4.1",
            messages=[{"role": "user", "content": request.prompt}]
        )
        return {"response": response.choices[0].message.content}
    except Exception as e:
        raise HTTPException(status_code=500, detail=str(e))

# To run:
# uvicorn filename:app --reload

Common variations

  • Use async def endpoints with await for non-blocking calls.
  • Switch models by changing the model parameter (e.g., gpt-4.1, claude-3-5-sonnet-20241022).
  • Integrate streaming responses for real-time token generation.
  • Use other SDKs like anthropic or langchain for advanced workflows.
python
import os
from fastapi import FastAPI
from pydantic import BaseModel
from openai import OpenAI

app = FastAPI()
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

class PromptRequest(BaseModel):
    prompt: str

@app.post("/generate-async")
async def generate_text_async(request: PromptRequest):
    response = await client.chat.completions.acreate(
        model="gpt-4.1",
        messages=[{"role": "user", "content": request.prompt}]
    )
    return {"response": response.choices[0].message.content}

Troubleshooting

  • If you get 401 Unauthorized, verify your OPENAI_API_KEY environment variable is set correctly.
  • For TimeoutError, increase client timeout or check network connectivity.
  • Use uvicorn filename:app --reload to auto-reload on code changes during development.
  • Check model name spelling to avoid model not found errors.

Key Takeaways

  • Use FastAPI with OpenAI SDK for clean, scalable LLM apps.
  • Always load API keys from environment variables for security.
  • Async endpoints improve performance for concurrent LLM requests.
  • Switch models easily by changing the model parameter in API calls.
  • Handle errors gracefully with proper HTTP exceptions in FastAPI.
Verified 2026-04 · gpt-4.1, claude-3-5-sonnet-20241022
Verify ↗