Message length and token budgets
Why this matters
Every API call to OpenAI charges by token usage, not by request count. A 10,000-word prompt and a 100-word prompt are billed differently. Without understanding token math, you'll either overspend or hit rate limits with vague error messages.
Explanation
What this does: The OpenAI Python SDK provides the tiktoken library integration to count tokens in text before sending it to the API. Tokens are how OpenAI measures and charges for API usage: roughly 1 token ≈ 4 characters in English, but varies by language and model.
How it works: When you create a message, the text is broken into tokens using a model-specific encoding. GPT-4 uses different tokenization than GPT-3.5. The OpenAI SDK counts tokens by loading the correct encoding for your model, then passing your text through it. The API returns usage.prompt_tokens and usage.completion_tokens in the response: but you should count before sending to budget accurately.
When to use it: Always count tokens before sending large batches of messages, before injecting user data into system prompts, or when building chatbots that accumulate conversation history. This prevents bill shock and helps you decide whether to truncate context or use a smaller model.
Request code
import tiktoken
from openai import OpenAI
client = OpenAI()
encoding = tiktoken.encoding_for_model('gpt-4')
message_text = "Explain quantum computing in 500 words. Include practical applications and limitations."
tokens = encoding.encode(message_text)
token_count = len(tokens)
print(f"Message: {message_text}")
print(f"Token count: {token_count}")
print(f"Estimated cost (GPT-4 input): ${token_count * 0.00003:.4f}")
if token_count > 8000:
print("Warning: This message uses more than 8000 tokens. Consider truncating.")
else:
response = client.chat.completions.create(
model='gpt-4',
messages=[
{'role': 'user', 'content': message_text}
]
)
print(f"\nAPI Response:")
print(f"Prompt tokens used: {response.usage.prompt_tokens}")
print(f"Completion tokens used: {response.usage.completion_tokens}")
print(f"Total tokens used: {response.usage.total_tokens}")
print(f"\nAssistant: {response.choices[0].message.content[:200]}...") Authentication
Set your API key as an environment variable before running: export OPENAI_API_KEY='sk-proj-your-actual-key-here' The OpenAI() client will read this automatically. No additional auth setup required beyond this.
Response shape
| Field | Description |
|---|---|
usage | [object Object] |
choices | [object Object] |
Field guide
prompt_tokens This is what you pay for input. Multiply by the input price per 1M tokens (e.g., GPT-4 is $0.03 per 1M tokens) to calculate input cost.
completion_tokens What the model generated. Always more expensive per token than input. GPT-4 output is $0.06 per 1M tokens vs $0.03 for input.
total_tokens The hidden gotcha: this tells you if the API truncated your context window. If you sent 6000 tokens but total_tokens is way lower, your context was cut off. Check this every time.
Setup trap
The tiktoken library is not installed by default. If you get ModuleNotFoundError: No module named 'tiktoken', install it with pip install tiktoken. The OpenAI SDK includes it as an optional dependency.
Cost
GPT-4 as of April 2026 charges $0.03 per 1M input tokens and $0.06 per 1M output tokens. A message that uses 100,000 tokens costs $3 in input alone. If you're running this in a loop 1000 times, that's $3,000. Counting tokens before the loop and truncating at 10,000 tokens saves $2,700.
Rate limits
OpenAI enforces rate limits by tokens per minute, not requests per minute. With a standard tier account, you're limited to 90,000 tokens per minute. A single 50,000-token message followed by another hits your limit immediately. Count tokens and implement exponential backoff on 429 errors.
Common gotcha
Developers count tokens for the message text, but forget that the API also tokenizes the system prompt, previous conversation history, and JSON structure of your messages. A 500-token message becomes 650 tokens by the time it hits the billing meter because of the {'role': 'user', 'content': ...} wrapper and any system context. Always test with a real API call and check response.usage.prompt_tokens: it will be higher than your manual count.
Error recovery
ModuleNotFoundError: No module named 'tiktoken'RateLimitErrorInvalidRequestError: This model's maximum context length is...Experienced dev note
Senior developers automate token counting at message construction time. Build a helper function that wraps your message creation and logs token counts to a monitoring system. This catches runaway context windows before they hit production. Also: never trust user input. A user can paste 1M characters of text as a 'search query.' Always count, always have a maximum token budget, and always truncate before sending. The 3-second latency hit from token counting is worth the cost savings and stability.
Check your understanding
You're building a chatbot that keeps the last 10 messages as context. The 10 messages total 45,000 tokens. GPT-4's context window is 128,000 tokens. A user sends a new 800-token message. Will this request succeed? Why or why not?
Show answer hint
Don't just add 45000 + 800. You must count tokens for the entire request structure that goes to the API: the system prompt (if any), all 10 previous messages with role/content wrappers, the new message, and the JSON structure itself. Also, the API reserves tokens for its own output: check OpenAI's documentation on how much completion tokens you need to reserve.