How to Intermediate · 4 min read

How to stream LLM to React frontend

Quick answer
Use the OpenAI SDK's chat.completions.create method with stream=True in a Python backend (e.g., FastAPI) to stream tokens. Then, forward these tokens via Server-Sent Events (SSE) to your React frontend, which listens and appends the streamed content in real time.

PREREQUISITES

  • Python 3.8+
  • OpenAI API key (free tier works)
  • pip install openai>=1.0 fastapi uvicorn
  • React 18+ environment

Setup backend streaming server

Install required Python packages and set your OpenAI API key in environment variables. Use FastAPI to create an SSE endpoint that streams tokens from the OpenAI chat.completions.create method with stream=True.

bash
pip install openai fastapi uvicorn
output
Collecting openai
Collecting fastapi
Collecting uvicorn
Successfully installed openai fastapi uvicorn

Step by step backend and React frontend

This example shows a minimal FastAPI backend streaming OpenAI chat completions and a React frontend consuming the stream via EventSource.

python
import os
from fastapi import FastAPI
from fastapi.responses import StreamingResponse
from openai import OpenAI
import asyncio

app = FastAPI()
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

async def event_generator(messages):
    stream = client.chat.completions.create(
        model="gpt-4o",
        messages=messages,
        stream=True
    )
    async for chunk in stream:
        content = chunk.choices[0].delta.content
        if content:
            yield f"data: {content}\n\n"

@app.get("/stream")
async def stream():
    messages = [{"role": "user", "content": "Explain streaming LLM to React frontend."}]
    return StreamingResponse(event_generator(messages), media_type="text/event-stream")

# React frontend example (App.js):
# import React, { useEffect, useState } from 'react';
# function App() {
#   const [text, setText] = useState('');
#   useEffect(() => {
#     const eventSource = new EventSource('http://localhost:8000/stream');
#     eventSource.onmessage = e => {
#       setText(prev => prev + e.data);
#     };
#     return () => eventSource.close();
#   }, []);
#   return <div><h1>Streaming LLM Output</h1><pre>{text}</pre></div>;
# }
# export default App
output
INFO:     Started server process [12345]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://127.0.0.1:8000 (Press CTRL+C to quit)

Common variations

  • Use async client calls for better concurrency.
  • Switch model to gpt-4o-mini or claude-sonnet-4-5 depending on provider.
  • Use WebSocket instead of SSE for bidirectional streaming.

Troubleshooting streaming issues

  • If the stream hangs, check your API key and network connectivity.
  • Ensure CORS is configured on FastAPI to allow your React frontend origin.
  • Use browser devtools to verify EventSource connection and data flow.

Key Takeaways

  • Use OpenAI SDK's stream=True to get token-by-token output from LLMs.
  • Forward streamed tokens via SSE from Python backend to React frontend for real-time display.
  • React's EventSource API is simple and effective for consuming SSE streams.
  • Configure CORS and environment variables properly to avoid common connection issues.
Verified 2026-04 · gpt-4o, gpt-4o-mini, claude-sonnet-4-5
Verify ↗