How to beginner · 3 min read

How to serve OpenAI responses with FastAPI

Q: How to serve OpenAI responses with FastAPI

Use the openai SDK v1 with FastAPI by creating an API endpoint that calls client.chat.completions.create() with your model and messages. Return the response.choices[0].message.content as the HTTP response to serve OpenAI completions via FastAPI.

Quick answer

Use the openai SDK v1 with FastAPI by creating an API endpoint that calls client.chat.completions.create() with your model and messages. Return the response.choices[0].message.content as the HTTP response to serve OpenAI completions via FastAPI.

PREREQUISITES

Python 3.8+
OpenAI API key (free tier works)
pip install openai>=1.0 fastapi uvicorn

Setup

Install the required packages and set your OpenAI API key as an environment variable.

Install packages: pip install openai fastapi uvicorn
Set environment variable: export OPENAI_API_KEY='your_api_key' (Linux/macOS) or set OPENAI_API_KEY=your_api_key (Windows)

bash

pip install openai fastapi uvicorn

Step by step

Create a FastAPI app with a POST endpoint that accepts a user prompt, calls the OpenAI gpt-4o model, and returns the generated text.

python

import os
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from openai import OpenAI

app = FastAPI()
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

class PromptRequest(BaseModel):
    prompt: str

@app.post("/generate")
async def generate_text(request: PromptRequest):
    try:
        response = client.chat.completions.create(
            model="gpt-4o",
            messages=[{"role": "user", "content": request.prompt}]
        )
        text = response.choices[0].message.content
        return {"response": text}
    except Exception as e:
        raise HTTPException(status_code=500, detail=str(e))

# To run: uvicorn filename:app --reload

Common variations

You can use async or sync calls, switch models like gpt-4o-mini, or add streaming with FastAPI WebSockets. For example, use model="gpt-4o-mini" for faster, cheaper responses. To handle streaming, integrate FastAPI WebSocket endpoints and consume OpenAI streaming responses.

python

from fastapi import WebSocket

@app.websocket("/stream")
async def stream_response(websocket: WebSocket):
    await websocket.accept()
    try:
        # Example: simplified streaming logic placeholder
        response = client.chat.completions.create(
            model="gpt-4o-mini",
            messages=[{"role": "user", "content": "Stream this response"}],
            stream=True
        )
        for chunk in response:
            await websocket.send_text(chunk.choices[0].delta.get('content', ''))
    except Exception as e:
        await websocket.send_text(f"Error: {str(e)}")
    finally:
        await websocket.close()

Troubleshooting

If you get 401 Unauthorized, verify your OPENAI_API_KEY environment variable is set correctly.
For TimeoutError, increase your client timeout or check network connectivity.
Use try-except blocks to catch API errors and return proper HTTP status codes.

✅

Key Takeaways

Use the OpenAI SDK v1 with FastAPI to serve AI completions via HTTP endpoints.
Always load API keys from environment variables for security and flexibility.
Handle exceptions to provide meaningful HTTP error responses in production.
Switch models or add streaming for different performance and UX needs.
Run FastAPI apps with Uvicorn for development and production readiness.

Verified 2026-04 · gpt-4o, gpt-4o-mini

Verify ↗