How to Intermediate · 3 min read

How to add WebSocket support for LLM chat in FastAPI

Quick answer
Use FastAPI's WebSocket class to create a WebSocket endpoint that handles bidirectional communication. Integrate the OpenAI SDK's chat.completions.create method inside the WebSocket handler to send user messages and stream AI responses in real time.

PREREQUISITES

  • Python 3.8+
  • OpenAI API key (free tier works)
  • pip install fastapi uvicorn openai>=1.0

Setup

Install the required packages and set your OpenAI API key as an environment variable.

bash
pip install fastapi uvicorn openai>=1.0

# Set environment variable in your shell
export OPENAI_API_KEY=os.environ["OPENAI_API_KEY"]

Step by step

Create a FastAPI app with a WebSocket endpoint that receives user messages, calls the OpenAI gpt-4o model, and sends back streamed responses over the WebSocket connection.

python
import os
import asyncio
from fastapi import FastAPI, WebSocket, WebSocketDisconnect
from openai import OpenAI

app = FastAPI()
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

@app.websocket("/ws/chat")
async def websocket_chat(websocket: WebSocket):
    await websocket.accept()
    try:
        while True:
            data = await websocket.receive_text()
            # Prepare messages for chat completion
            messages = [{"role": "user", "content": data}]

            # Create streaming chat completion
            response = client.chat.completions.create(
                model="gpt-4o",
                messages=messages,
                stream=True
            )

            # Stream tokens back to client
            async for chunk in response:
                delta = chunk.choices[0].delta.get("content", "")
                if delta:
                    await websocket.send_text(delta)

            # Send a special message to indicate completion
            await websocket.send_text("<END>")

    except WebSocketDisconnect:
        print("Client disconnected")

Common variations

  • Use async for with streaming to send partial responses as they arrive.
  • Switch to other models like gpt-4.1 or claude-3-5-sonnet-20241022 by changing the model parameter.
  • Implement client-side JavaScript to connect to the WebSocket and display streamed tokens in real time.

Troubleshooting

  • If the WebSocket connection fails, verify the client is connecting to the correct URL ws://localhost:8000/ws/chat.
  • For API errors, check your OPENAI_API_KEY environment variable is set correctly.
  • Ensure your FastAPI server is running with uvicorn main:app --reload and that no firewall blocks WebSocket traffic.

Key Takeaways

  • Use FastAPI's WebSocket endpoint to enable real-time LLM chat interactions.
  • Stream responses from client.chat.completions.create with stream=True for low-latency token delivery.
  • Always get API keys from os.environ and never hardcode them in code.
  • Handle WebSocketDisconnect exceptions to manage client disconnects gracefully.
  • Client-side JavaScript is required to consume and display streamed WebSocket messages.
Verified 2026-04 · gpt-4o, gpt-4.1, claude-3-5-sonnet-20241022
Verify ↗