How to serve OpenAI responses with FastAPI
Quick answer
Use the
openai SDK v1 with FastAPI by creating an API endpoint that calls client.chat.completions.create() with your model and messages. Return the response.choices[0].message.content as the HTTP response to serve OpenAI completions via FastAPI.PREREQUISITES
Python 3.8+OpenAI API key (free tier works)pip install openai>=1.0 fastapi uvicorn
Setup
Install the required packages and set your OpenAI API key as an environment variable.
- Install packages:
pip install openai fastapi uvicorn - Set environment variable:
export OPENAI_API_KEY='your_api_key'(Linux/macOS) orset OPENAI_API_KEY=your_api_key(Windows)
pip install openai fastapi uvicorn Step by step
Create a FastAPI app with a POST endpoint that accepts a user prompt, calls the OpenAI gpt-4o model, and returns the generated text.
import os
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from openai import OpenAI
app = FastAPI()
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
class PromptRequest(BaseModel):
prompt: str
@app.post("/generate")
async def generate_text(request: PromptRequest):
try:
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": request.prompt}]
)
text = response.choices[0].message.content
return {"response": text}
except Exception as e:
raise HTTPException(status_code=500, detail=str(e))
# To run: uvicorn filename:app --reload Common variations
You can use async or sync calls, switch models like gpt-4o-mini, or add streaming with FastAPI WebSockets. For example, use model="gpt-4o-mini" for faster, cheaper responses. To handle streaming, integrate FastAPI WebSocket endpoints and consume OpenAI streaming responses.
from fastapi import WebSocket
@app.websocket("/stream")
async def stream_response(websocket: WebSocket):
await websocket.accept()
try:
# Example: simplified streaming logic placeholder
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Stream this response"}],
stream=True
)
for chunk in response:
await websocket.send_text(chunk.choices[0].delta.get('content', ''))
except Exception as e:
await websocket.send_text(f"Error: {str(e)}")
finally:
await websocket.close() Troubleshooting
- If you get
401 Unauthorized, verify yourOPENAI_API_KEYenvironment variable is set correctly. - For
TimeoutError, increase your client timeout or check network connectivity. - Use
try-exceptblocks to catch API errors and return proper HTTP status codes.
Key Takeaways
- Use the OpenAI SDK v1 with FastAPI to serve AI completions via HTTP endpoints.
- Always load API keys from environment variables for security and flexibility.
- Handle exceptions to provide meaningful HTTP error responses in production.
- Switch models or add streaming for different performance and UX needs.
- Run FastAPI apps with Uvicorn for development and production readiness.