How to beginner · 3 min read

How to manage OpenAI token usage

Q: How to manage OpenAI token usage

Use the usage field in the OpenAI API response to track token consumption per request. Implement logging and limits in your code to monitor and control prompt_tokens, completion_tokens, and total_tokens for cost management.

Quick answer

Use the usage field in the OpenAI API response to track token consumption per request. Implement logging and limits in your code to monitor and control prompt_tokens, completion_tokens, and total_tokens for cost management.

PREREQUISITES

Python 3.8+
OpenAI API key (free tier works)
pip install openai>=1.0

Setup

Install the official OpenAI Python SDK and set your API key as an environment variable to authenticate requests.

bash

pip install openai>=1.0

Step by step

This example demonstrates how to send a chat completion request and extract token usage details from the response to monitor your consumption.

python

import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Explain token usage management."}]
)

print("Response:", response.choices[0].message.content)
print("Token usage details:")
print(f"Prompt tokens: {response.usage.prompt_tokens}")
print(f"Completion tokens: {response.usage.completion_tokens}")
print(f"Total tokens: {response.usage.total_tokens}")

output

Response: Token usage management involves tracking prompt and completion tokens to optimize costs.
Token usage details:
Prompt tokens: 15
Completion tokens: 20
Total tokens: 35

Common variations

You can manage token usage by setting max_tokens to limit completion length, or by using streaming to process tokens incrementally. Different models have different token limits, so choose accordingly.

python

import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# Limit tokens to control usage
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Summarize token usage."}],
    max_tokens=50
)

print(f"Total tokens used: {response.usage.total_tokens}")

output

Total tokens used: 45

Troubleshooting

If you receive errors about token limits, reduce max_tokens or shorten your prompt. Monitor total_tokens to avoid exceeding model limits and incurring unexpected costs.

Key Takeaways

Always check the usage field in API responses to monitor token consumption.
Use max_tokens to limit completion length and control costs.
Adjust prompt length and model choice based on token limits to avoid errors.
Log token usage regularly to track spending and optimize usage patterns.

Verified 2026-04 · gpt-4o

Verify ↗

Community Notes

No notes yetBe the first to share a version-specific fix or tip.