How to Intermediate · 3 min read

How to add WebSocket support for LLM chat in FastAPI

Q: How to add WebSocket support for LLM chat in FastAPI

Use FastAPI's WebSocket class to create a WebSocket endpoint that handles bidirectional communication. Integrate the OpenAI SDK's chat.completions.create method inside the WebSocket handler to send user messages and stream AI responses in real time.

Quick answer

Use FastAPI's WebSocket class to create a WebSocket endpoint that handles bidirectional communication. Integrate the OpenAI SDK's chat.completions.create method inside the WebSocket handler to send user messages and stream AI responses in real time.

PREREQUISITES

Python 3.8+
OpenAI API key (free tier works)
pip install fastapi uvicorn openai>=1.0

Setup

Install the required packages and set your OpenAI API key as an environment variable.

bash

pip install fastapi uvicorn openai>=1.0

# Set environment variable in your shell
export OPENAI_API_KEY=os.environ["OPENAI_API_KEY"]

Step by step

Create a FastAPI app with a WebSocket endpoint that receives user messages, calls the OpenAI gpt-4o model, and sends back streamed responses over the WebSocket connection.

python

import os
import asyncio
from fastapi import FastAPI, WebSocket, WebSocketDisconnect
from openai import OpenAI

app = FastAPI()
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

@app.websocket("/ws/chat")
async def websocket_chat(websocket: WebSocket):
    await websocket.accept()
    try:
        while True:
            data = await websocket.receive_text()
            # Prepare messages for chat completion
            messages = [{"role": "user", "content": data}]

            # Create streaming chat completion
            response = client.chat.completions.create(
                model="gpt-4o",
                messages=messages,
                stream=True
            )

            # Stream tokens back to client
            async for chunk in response:
                delta = chunk.choices[0].delta.get("content", "")
                if delta:
                    await websocket.send_text(delta)

            # Send a special message to indicate completion
            await websocket.send_text("<END>")

    except WebSocketDisconnect:
        print("Client disconnected")

Common variations

Use async for with streaming to send partial responses as they arrive.
Switch to other models like gpt-4.1 or claude-3-5-sonnet-20241022 by changing the model parameter.
Implement client-side JavaScript to connect to the WebSocket and display streamed tokens in real time.

Troubleshooting

If the WebSocket connection fails, verify the client is connecting to the correct URL ws://localhost:8000/ws/chat.
For API errors, check your OPENAI_API_KEY environment variable is set correctly.
Ensure your FastAPI server is running with uvicorn main:app --reload and that no firewall blocks WebSocket traffic.

✅

Key Takeaways

Use FastAPI's WebSocket endpoint to enable real-time LLM chat interactions.
Stream responses from client.chat.completions.create with stream=True for low-latency token delivery.
Always get API keys from os.environ and never hardcode them in code.
Handle WebSocketDisconnect exceptions to manage client disconnects gracefully.
Client-side JavaScript is required to consume and display streamed WebSocket messages.

Verified 2026-04 · gpt-4o, gpt-4.1, claude-3-5-sonnet-20241022

Verify ↗