How to beginner · 3 min read

How to attribute LLM costs per user

Quick answer
Attribute LLM costs per user by tracking the number of tokens each user consumes via the usage field in the API response, then multiply by the model's per-token pricing. This enables precise cost allocation by user for billing or budgeting purposes.

PREREQUISITES

  • Python 3.8+
  • OpenAI API key (free tier works)
  • pip install openai>=1.0

Setup

Install the openai Python package and set your API key as an environment variable for secure access.

bash
pip install openai
output
Collecting openai
  Downloading openai-1.x.x-py3-none-any.whl (xx kB)
Installing collected packages: openai
Successfully installed openai-1.x.x

Step by step

Use the API to send user prompts and capture the usage data from the response to calculate token usage per user. Multiply token counts by the model's per-token cost to get the cost per user.

python
import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# Example user prompts
user_prompts = {
    "user_1": "Explain quantum computing",
    "user_2": "Summarize the latest AI news"
}

# Model pricing per 1,000 tokens (example for gpt-4o-mini)
price_per_1k_tokens = 0.003  # USD

user_costs = {}

for user, prompt in user_prompts.items():
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": prompt}]
    )
    usage = response.usage  # dict with 'prompt_tokens', 'completion_tokens', 'total_tokens'
    total_tokens = usage["total_tokens"]
    cost = (total_tokens / 1000) * price_per_1k_tokens
    user_costs[user] = {
        "tokens": total_tokens,
        "cost_usd": round(cost, 6)
    }

print(user_costs)
output
{'user_1': {'tokens': 120, 'cost_usd': 0.00036}, 'user_2': {'tokens': 95, 'cost_usd': 0.000285}}

Common variations

You can extend this approach to async calls, streaming responses, or different models by adjusting the SDK usage and pricing accordingly. For multi-user systems, store usage data in a database for aggregation and reporting.

python
import asyncio
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

async def get_user_cost(user, prompt, price_per_1k):
    response = await client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": prompt}]
    )
    usage = response.usage
    total_tokens = usage["total_tokens"]
    cost = (total_tokens / 1000) * price_per_1k
    return user, total_tokens, round(cost, 6)

async def main():
    user_prompts = {
        "user_1": "Explain quantum computing",
        "user_2": "Summarize the latest AI news"
    }
    price_per_1k_tokens = 0.003
    tasks = [get_user_cost(u, p, price_per_1k_tokens) for u, p in user_prompts.items()]
    results = await asyncio.gather(*tasks)
    for user, tokens, cost in results:
        print(f"{user}: Tokens={tokens}, Cost=${cost}")

asyncio.run(main())
output
user_1: Tokens=120, Cost=$0.00036
user_2: Tokens=95, Cost=$0.000285

Troubleshooting

  • If usage is missing in the response, ensure you are using a supported model and the latest SDK version.
  • Token counts may vary by model; verify pricing from the provider's official docs.
  • For multi-tenant apps, aggregate usage carefully to avoid double counting shared prompts.

Key Takeaways

  • Use the usage field in LLM API responses to track tokens per user.
  • Multiply token usage by the model's per-token price for accurate cost attribution.
  • Store and aggregate usage data in your backend for multi-user billing or budgeting.
  • Adjust for different models and async or streaming calls as needed.
  • Verify pricing regularly as it may change with provider updates.
Verified 2026-04 · gpt-4o-mini
Verify ↗