How to add WebSocket support for LLM chat in FastAPI
Quick answer
Use FastAPI's
WebSocket class to create a WebSocket endpoint that handles bidirectional communication. Integrate the OpenAI SDK's chat.completions.create method inside the WebSocket handler to send user messages and stream AI responses in real time.PREREQUISITES
Python 3.8+OpenAI API key (free tier works)pip install fastapi uvicorn openai>=1.0
Setup
Install the required packages and set your OpenAI API key as an environment variable.
pip install fastapi uvicorn openai>=1.0
# Set environment variable in your shell
export OPENAI_API_KEY=os.environ["OPENAI_API_KEY"] Step by step
Create a FastAPI app with a WebSocket endpoint that receives user messages, calls the OpenAI gpt-4o model, and sends back streamed responses over the WebSocket connection.
import os
import asyncio
from fastapi import FastAPI, WebSocket, WebSocketDisconnect
from openai import OpenAI
app = FastAPI()
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
@app.websocket("/ws/chat")
async def websocket_chat(websocket: WebSocket):
await websocket.accept()
try:
while True:
data = await websocket.receive_text()
# Prepare messages for chat completion
messages = [{"role": "user", "content": data}]
# Create streaming chat completion
response = client.chat.completions.create(
model="gpt-4o",
messages=messages,
stream=True
)
# Stream tokens back to client
async for chunk in response:
delta = chunk.choices[0].delta.get("content", "")
if delta:
await websocket.send_text(delta)
# Send a special message to indicate completion
await websocket.send_text("<END>")
except WebSocketDisconnect:
print("Client disconnected") Common variations
- Use
async forwith streaming to send partial responses as they arrive. - Switch to other models like
gpt-4.1orclaude-3-5-sonnet-20241022by changing themodelparameter. - Implement client-side JavaScript to connect to the WebSocket and display streamed tokens in real time.
Troubleshooting
- If the WebSocket connection fails, verify the client is connecting to the correct URL
ws://localhost:8000/ws/chat. - For API errors, check your
OPENAI_API_KEYenvironment variable is set correctly. - Ensure your FastAPI server is running with
uvicorn main:app --reloadand that no firewall blocks WebSocket traffic.
Key Takeaways
- Use FastAPI's
WebSocketendpoint to enable real-time LLM chat interactions. - Stream responses from
client.chat.completions.createwithstream=Truefor low-latency token delivery. - Always get API keys from
os.environand never hardcode them in code. - Handle
WebSocketDisconnectexceptions to manage client disconnects gracefully. - Client-side JavaScript is required to consume and display streamed WebSocket messages.