How to intermediate · 3 min read

How to stream Claude responses with FastAPI

Quick answer
Use the anthropic.Anthropic client with stream=True in the messages.create() call to receive streaming responses from claude-3-5-sonnet-20241022. Integrate this with FastAPI's StreamingResponse to stream tokens to clients in real time.

PREREQUISITES

  • Python 3.8+
  • Anthropic API key
  • pip install anthropic fastapi uvicorn

Setup

Install the required packages and set your Anthropic API key as an environment variable.

  • Install packages: pip install anthropic fastapi uvicorn
  • Set environment variable: export ANTHROPIC_API_KEY='your_api_key_here' (Linux/macOS) or set ANTHROPIC_API_KEY=your_api_key_here (Windows)
bash
pip install anthropic fastapi uvicorn

Step by step

This example shows a complete FastAPI app that streams Claude responses token-by-token using the anthropic.Anthropic client with stream=True. The endpoint /chat accepts a JSON payload with a user message and streams the AI's response as a server-sent event.

python
import os
import json
from fastapi import FastAPI, Request
from fastapi.responses import StreamingResponse
import anthropic

app = FastAPI()
client = anthropic.Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])

async def stream_claude_response(user_message: str):
    # Prepare the messages list
    messages = [{"role": "user", "content": user_message}]

    # Call Anthropic's streaming chat completion
    stream = client.messages.create(
        model="claude-3-5-sonnet-20241022",
        system="You are a helpful assistant.",
        messages=messages,
        stream=True,
        max_tokens=500
    )

    # Stream tokens as they arrive
    async for chunk in stream:
        # chunk.content is the incremental text
        yield f"data: {json.dumps(chunk.content)}\n\n"

@app.post("/chat")
async def chat(request: Request):
    data = await request.json()
    user_message = data.get("message", "")
    if not user_message:
        return {"error": "No message provided"}

    return StreamingResponse(
        stream_claude_response(user_message),
        media_type="text/event-stream"
    )

# To run:
# uvicorn filename:app --reload

Common variations

You can adapt the streaming example for synchronous FastAPI endpoints by using anyio.to_thread.run_sync to run the blocking stream in a thread. Also, you can switch models by changing the model parameter, e.g., to claude-opus-4. For non-streaming, omit stream=True and handle the full response at once.

python
import anyio
from fastapi import Request
from fastapi.responses import StreamingResponse

# Example: sync wrapper for streaming in FastAPI sync endpoint

def stream_sync(user_message: str):
    messages = [{"role": "user", "content": user_message}]
    stream = client.messages.create(
        model="claude-3-5-sonnet-20241022",
        system="You are a helpful assistant.",
        messages=messages,
        stream=True,
        max_tokens=500
    )
    for chunk in stream:
        yield f"data: {json.dumps(chunk.content)}\n\n"

@app.post("/chat-sync")
def chat_sync(request: Request):
    data = request.json()
    user_message = data.get("message", "")
    return StreamingResponse(stream_sync(user_message), media_type="text/event-stream")

Troubleshooting

  • Empty or no response: Check your API key is set correctly in ANTHROPIC_API_KEY.
  • Streaming hangs: Ensure your client supports Server-Sent Events (SSE) and the FastAPI server is running with uvicorn.
  • Rate limits or errors: Handle exceptions around client.messages.create() and implement retries or backoff.

Key Takeaways

  • Use stream=True with anthropic.Anthropic.messages.create() to get streaming responses from Claude.
  • Integrate streaming with FastAPI using StreamingResponse and server-sent events for real-time token delivery.
  • Always set your Anthropic API key in ANTHROPIC_API_KEY environment variable to authenticate requests.
Verified 2026-04 · claude-3-5-sonnet-20241022
Verify ↗