How to beginner · 3 min read

How to manage OpenAI token usage

Quick answer
Use the usage field in the OpenAI API response to track token consumption per request. Implement logging and limits in your code to monitor and control prompt_tokens, completion_tokens, and total_tokens for cost management.

PREREQUISITES

  • Python 3.8+
  • OpenAI API key (free tier works)
  • pip install openai>=1.0

Setup

Install the official OpenAI Python SDK and set your API key as an environment variable to authenticate requests.

bash
pip install openai>=1.0

Step by step

This example demonstrates how to send a chat completion request and extract token usage details from the response to monitor your consumption.

python
import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Explain token usage management."}]
)

print("Response:", response.choices[0].message.content)
print("Token usage details:")
print(f"Prompt tokens: {response.usage.prompt_tokens}")
print(f"Completion tokens: {response.usage.completion_tokens}")
print(f"Total tokens: {response.usage.total_tokens}")
output
Response: Token usage management involves tracking prompt and completion tokens to optimize costs.
Token usage details:
Prompt tokens: 15
Completion tokens: 20
Total tokens: 35

Common variations

You can manage token usage by setting max_tokens to limit completion length, or by using streaming to process tokens incrementally. Different models have different token limits, so choose accordingly.

python
import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# Limit tokens to control usage
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Summarize token usage."}],
    max_tokens=50
)

print(f"Total tokens used: {response.usage.total_tokens}")
output
Total tokens used: 45

Troubleshooting

If you receive errors about token limits, reduce max_tokens or shorten your prompt. Monitor total_tokens to avoid exceeding model limits and incurring unexpected costs.

Key Takeaways

  • Always check the usage field in API responses to monitor token consumption.
  • Use max_tokens to limit completion length and control costs.
  • Adjust prompt length and model choice based on token limits to avoid errors.
  • Log token usage regularly to track spending and optimize usage patterns.
Verified 2026-04 · gpt-4o
Verify ↗