How to estimate LLM API costs
LLM API costs by multiplying the number of tokens processed (input + output) by the model's per-token price. Use usage data from API responses or logs, and factor in model choice since prices vary by model and provider.PREREQUISITES
Python 3.8+OpenAI API key (free tier works)pip install openai>=1.0
Setup
Install the official openai Python SDK and set your API key as an environment variable for secure access.
pip install openai>=1.0 Collecting openai Downloading openai-1.x.x-py3-none-any.whl Installing collected packages: openai Successfully installed openai-1.x.x
Step by step
Use the API response's usage field to get token counts, then multiply by the model's per-token price to estimate cost.
import os
from openai import OpenAI
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
# Example chat completion call
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello, how much will this cost?"}]
)
# Extract token usage
usage = response.usage
prompt_tokens = usage.prompt_tokens
completion_tokens = usage.completion_tokens
total_tokens = usage.total_tokens
# Pricing per 1K tokens for gpt-4o (example, check current pricing)
price_per_1k_tokens = 0.03 # USD
# Calculate cost
cost = (total_tokens / 1000) * price_per_1k_tokens
print(f"Prompt tokens: {prompt_tokens}")
print(f"Completion tokens: {completion_tokens}")
print(f"Total tokens: {total_tokens}")
print(f"Estimated cost: ${cost:.6f}") Prompt tokens: 10 Completion tokens: 15 Total tokens: 25 Estimated cost: $0.000750
Common variations
You can estimate costs asynchronously or for streaming calls by tracking tokens incrementally. Different models have different prices, so adjust price_per_1k_tokens accordingly. For example, gpt-4o-mini is cheaper than gpt-4o.
import asyncio
from openai import OpenAI
async def estimate_cost_async():
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
response = await client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Estimate cost async example."}]
)
usage = response.usage
total_tokens = usage.total_tokens
price_per_1k_tokens = 0.006 # Example price for gpt-4o-mini
cost = (total_tokens / 1000) * price_per_1k_tokens
print(f"Total tokens: {total_tokens}")
print(f"Estimated cost: ${cost:.6f}")
asyncio.run(estimate_cost_async()) Total tokens: 20 Estimated cost: $0.000120
Troubleshooting
If you see usage missing in the response, ensure your SDK and API version are up to date and that your model supports usage reporting. Also, verify your API key permissions. For unexpected high costs, monitor token usage closely and consider switching to smaller models or limiting max tokens.
Key Takeaways
- Use the API response's
usagefield to get accurate token counts for cost estimation. - Multiply total tokens by the model's per-1K-token price to calculate your LLM API cost.
- Adjust pricing based on the specific model and provider as costs vary significantly.
- Monitor token usage in real time for streaming or async calls to control expenses.
- Keep SDK and API versions current to ensure usage data is available and accurate.