How to beginner · 3 min read

How to track LLM token usage in FastAPI

Quick answer
Use the usage field in the LLM response from the OpenAI SDK v1 to track token usage. In FastAPI, capture this data after calling client.chat.completions.create() and log or return it as needed for monitoring or billing.

PREREQUISITES

  • Python 3.8+
  • OpenAI API key (free tier works)
  • pip install openai>=1.0
  • pip install fastapi uvicorn

Setup

Install required packages and set your OpenAI API key as an environment variable.

  • Install FastAPI and Uvicorn for the web server.
  • Install the OpenAI SDK v1 for API calls.
  • Set OPENAI_API_KEY in your environment.
bash
pip install fastapi uvicorn openai>=1.0

Step by step

This example shows a FastAPI app that calls the OpenAI gpt-4o model and returns both the generated text and token usage details.

python
import os
from fastapi import FastAPI
from pydantic import BaseModel
from openai import OpenAI

app = FastAPI()
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

class PromptRequest(BaseModel):
    prompt: str

@app.post("/generate")
async def generate_text(request: PromptRequest):
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": request.prompt}]
    )
    text = response.choices[0].message.content
    usage = response.usage  # Contains token usage info
    return {
        "text": text,
        "token_usage": {
            "prompt_tokens": usage.prompt_tokens,
            "completion_tokens": usage.completion_tokens,
            "total_tokens": usage.total_tokens
        }
    }
output
{
  "text": "Hello, how can I assist you today?",
  "token_usage": {
    "prompt_tokens": 10,
    "completion_tokens": 15,
    "total_tokens": 25
  }
}

Common variations

You can track token usage similarly with other models like gpt-4.1 or Anthropic's claude-3-5-sonnet-20241022. For async calls, use FastAPI's async endpoints as shown. For streaming, accumulate token counts from partial responses if supported by the SDK.

python
import os
from fastapi import FastAPI
from pydantic import BaseModel
from anthropic import Anthropic

app = FastAPI()
client = Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])

class PromptRequest(BaseModel):
    prompt: str

@app.post("/generate_claude")
async def generate_claude(request: PromptRequest):
    message = client.messages.create(
        model="claude-3-5-sonnet-20241022",
        max_tokens=1024,
        system="You are a helpful assistant.",
        messages=[{"role": "user", "content": request.prompt}]
    )
    # Anthropic SDK v0.20+ does not expose usage directly; track tokens externally if needed
    return {"text": message.content[0].text}

Troubleshooting

  • If usage is missing in the response, verify you are using the latest OpenAI SDK v1 and a supported model.
  • Ensure your API key is set correctly in os.environ.
  • For streaming, token usage may not be available until the stream completes.

Key Takeaways

  • Use the usage field from the OpenAI SDK v1 response to get token counts.
  • Integrate token usage tracking directly in your FastAPI endpoint for real-time monitoring.
  • Different LLM providers and SDKs may expose token usage differently; check their docs.
  • Always secure your API keys via environment variables, never hardcode them.
  • Streaming responses require special handling to accumulate token usage.
Verified 2026-04 · gpt-4o, gpt-4.1, claude-3-5-sonnet-20241022
Verify ↗