How to track LLM API costs with code
Quick answer
Use the
usage field in the response from chat.completions.create or completions.create to get token counts. Multiply token usage by your model's per-token price to track costs programmatically in Python. This enables real-time API cost monitoring.PREREQUISITES
Python 3.8+OpenAI API key (free tier works)pip install openai>=1.0
Setup
Install the openai Python package and set your API key as an environment variable for secure authentication.
pip install openai output
Collecting openai Downloading openai-1.x.x-py3-none-any.whl (xx kB) Installing collected packages: openai Successfully installed openai-1.x.x
Step by step
This example shows how to call the OpenAI gpt-4o chat model, extract token usage from the response, and calculate the cost based on the current pricing.
import os
from openai import OpenAI
# Initialize client
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
# Define model and prompt
model = "gpt-4o"
messages = [{"role": "user", "content": "Explain recursion in simple terms."}]
# Call chat completion
response = client.chat.completions.create(model=model, messages=messages)
# Extract usage info
usage = response.usage
prompt_tokens = usage.prompt_tokens
completion_tokens = usage.completion_tokens
total_tokens = usage.total_tokens
# Pricing (example: $0.03 per 1K prompt tokens, $0.06 per 1K completion tokens)
prompt_price_per_1k = 0.03
completion_price_per_1k = 0.06
# Calculate cost
cost = (prompt_tokens / 1000) * prompt_price_per_1k + (completion_tokens / 1000) * completion_price_per_1k
print(f"Prompt tokens: {prompt_tokens}")
print(f"Completion tokens: {completion_tokens}")
print(f"Total tokens: {total_tokens}")
print(f"Estimated cost: ${cost:.6f}") output
Prompt tokens: 15 Completion tokens: 45 Total tokens: 60 Estimated cost: $0.003150
Common variations
- Use
asynccalls withasynciofor non-blocking cost tracking. - Adapt pricing for different models by updating per-token rates.
- Track costs across multiple calls by accumulating token usage.
import asyncio
from openai import OpenAI
async def track_cost_async():
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
model = "gpt-4o-mini"
messages = [{"role": "user", "content": "Summarize AI trends."}]
response = await client.chat.completions.create(model=model, messages=messages)
usage = response.usage
prompt_tokens = usage.prompt_tokens
completion_tokens = usage.completion_tokens
# Example pricing for gpt-4o-mini
prompt_price_per_1k = 0.002
completion_price_per_1k = 0.004
cost = (prompt_tokens / 1000) * prompt_price_per_1k + (completion_tokens / 1000) * completion_price_per_1k
print(f"Async prompt tokens: {prompt_tokens}")
print(f"Async completion tokens: {completion_tokens}")
print(f"Async estimated cost: ${cost:.6f}")
asyncio.run(track_cost_async()) output
Async prompt tokens: 10 Async completion tokens: 20 Async estimated cost: $0.000100
Troubleshooting
- If
usageis missing, ensure your SDK and API version support usage reporting. - Check your environment variable
OPENAI_API_KEYis set correctly. - Verify you are using a supported model that returns usage data.
Key Takeaways
- Extract token usage from the
usagefield in API responses to monitor costs. - Multiply token counts by your model's per-token price to calculate real-time expenses.
- Use async calls and accumulate usage for efficient cost tracking in production.
- Keep your API key secure via environment variables to avoid leaks.
- Verify SDK and model support usage reporting to avoid missing cost data.