How to beginner · 3 min read

How to track LLM API costs with code

Q: How to track LLM API costs with code

Use the usage field in the response from chat.completions.create or completions.create to get token counts. Multiply token usage by your model's per-token price to track costs programmatically in Python. This enables real-time API cost monitoring.

Quick answer

Use the usage field in the response from chat.completions.create or completions.create to get token counts. Multiply token usage by your model's per-token price to track costs programmatically in Python. This enables real-time API cost monitoring.

PREREQUISITES

Python 3.8+
OpenAI API key (free tier works)
pip install openai>=1.0

Setup

Install the openai Python package and set your API key as an environment variable for secure authentication.

bash

pip install openai

output

Collecting openai
  Downloading openai-1.x.x-py3-none-any.whl (xx kB)
Installing collected packages: openai
Successfully installed openai-1.x.x

Step by step

This example shows how to call the OpenAI gpt-4o chat model, extract token usage from the response, and calculate the cost based on the current pricing.

python

import os
from openai import OpenAI

# Initialize client
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# Define model and prompt
model = "gpt-4o"
messages = [{"role": "user", "content": "Explain recursion in simple terms."}]

# Call chat completion
response = client.chat.completions.create(model=model, messages=messages)

# Extract usage info
usage = response.usage
prompt_tokens = usage.prompt_tokens
completion_tokens = usage.completion_tokens
total_tokens = usage.total_tokens

# Pricing (example: $0.03 per 1K prompt tokens, $0.06 per 1K completion tokens)
prompt_price_per_1k = 0.03
completion_price_per_1k = 0.06

# Calculate cost
cost = (prompt_tokens / 1000) * prompt_price_per_1k + (completion_tokens / 1000) * completion_price_per_1k

print(f"Prompt tokens: {prompt_tokens}")
print(f"Completion tokens: {completion_tokens}")
print(f"Total tokens: {total_tokens}")
print(f"Estimated cost: ${cost:.6f}")

output

Prompt tokens: 15
Completion tokens: 45
Total tokens: 60
Estimated cost: $0.003150

Common variations

Use async calls with asyncio for non-blocking cost tracking.
Adapt pricing for different models by updating per-token rates.
Track costs across multiple calls by accumulating token usage.

python

import asyncio
from openai import OpenAI

async def track_cost_async():
    client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
    model = "gpt-4o-mini"
    messages = [{"role": "user", "content": "Summarize AI trends."}]

    response = await client.chat.completions.create(model=model, messages=messages)
    usage = response.usage

    prompt_tokens = usage.prompt_tokens
    completion_tokens = usage.completion_tokens

    # Example pricing for gpt-4o-mini
    prompt_price_per_1k = 0.002
    completion_price_per_1k = 0.004

    cost = (prompt_tokens / 1000) * prompt_price_per_1k + (completion_tokens / 1000) * completion_price_per_1k

    print(f"Async prompt tokens: {prompt_tokens}")
    print(f"Async completion tokens: {completion_tokens}")
    print(f"Async estimated cost: ${cost:.6f}")

asyncio.run(track_cost_async())

output

Async prompt tokens: 10
Async completion tokens: 20
Async estimated cost: $0.000100

Troubleshooting

If usage is missing, ensure your SDK and API version support usage reporting.
Check your environment variable OPENAI_API_KEY is set correctly.
Verify you are using a supported model that returns usage data.

Key Takeaways

Extract token usage from the usage field in API responses to monitor costs.
Multiply token counts by your model's per-token price to calculate real-time expenses.
Use async calls and accumulate usage for efficient cost tracking in production.
Keep your API key secure via environment variables to avoid leaks.
Verify SDK and model support usage reporting to avoid missing cost data.

Verified 2026-04 · gpt-4o, gpt-4o-mini

Verify ↗

Community Notes

No notes yetBe the first to share a version-specific fix or tip.