How to beginner · 3 min read

How to estimate LLM API costs

Q: How to estimate LLM API costs

Estimate LLM API costs by multiplying the number of tokens processed (input + output) by the model's per-token price. Use usage data from API responses or logs, and factor in model choice since prices vary by model and provider.

Quick answer

Estimate LLM API costs by multiplying the number of tokens processed (input + output) by the model's per-token price. Use usage data from API responses or logs, and factor in model choice since prices vary by model and provider.

PREREQUISITES

Python 3.8+
OpenAI API key (free tier works)
pip install openai>=1.0

Setup

Install the official openai Python SDK and set your API key as an environment variable for secure access.

bash

pip install openai>=1.0

output

Collecting openai
  Downloading openai-1.x.x-py3-none-any.whl
Installing collected packages: openai
Successfully installed openai-1.x.x

Step by step

Use the API response's usage field to get token counts, then multiply by the model's per-token price to estimate cost.

python

import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# Example chat completion call
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello, how much will this cost?"}]
)

# Extract token usage
usage = response.usage
prompt_tokens = usage.prompt_tokens
completion_tokens = usage.completion_tokens
total_tokens = usage.total_tokens

# Pricing per 1K tokens for gpt-4o (example, check current pricing)
price_per_1k_tokens = 0.03  # USD

# Calculate cost
cost = (total_tokens / 1000) * price_per_1k_tokens

print(f"Prompt tokens: {prompt_tokens}")
print(f"Completion tokens: {completion_tokens}")
print(f"Total tokens: {total_tokens}")
print(f"Estimated cost: ${cost:.6f}")

output

Prompt tokens: 10
Completion tokens: 15
Total tokens: 25
Estimated cost: $0.000750

Common variations

You can estimate costs asynchronously or for streaming calls by tracking tokens incrementally. Different models have different prices, so adjust price_per_1k_tokens accordingly. For example, gpt-4o-mini is cheaper than gpt-4o.

python

import asyncio
from openai import OpenAI

async def estimate_cost_async():
    client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
    response = await client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": "Estimate cost async example."}]
    )
    usage = response.usage
    total_tokens = usage.total_tokens
    price_per_1k_tokens = 0.006  # Example price for gpt-4o-mini
    cost = (total_tokens / 1000) * price_per_1k_tokens
    print(f"Total tokens: {total_tokens}")
    print(f"Estimated cost: ${cost:.6f}")

asyncio.run(estimate_cost_async())

output

Total tokens: 20
Estimated cost: $0.000120

Troubleshooting

If you see usage missing in the response, ensure your SDK and API version are up to date and that your model supports usage reporting. Also, verify your API key permissions. For unexpected high costs, monitor token usage closely and consider switching to smaller models or limiting max tokens.

✅

Key Takeaways

Use the API response's usage field to get accurate token counts for cost estimation.
Multiply total tokens by the model's per-1K-token price to calculate your LLM API cost.
Adjust pricing based on the specific model and provider as costs vary significantly.
Monitor token usage in real time for streaming or async calls to control expenses.
Keep SDK and API versions current to ensure usage data is available and accurate.

Verified 2026-04 · gpt-4o, gpt-4o-mini

Verify ↗