API Beginner easy · 5 min

Message length and token budgets

What you will learn

Calculate and control how many tokens your messages consume before sending them to OpenAI's API to avoid unexpected costs and rate limits.

Why this matters

Every API call to OpenAI charges by token usage, not by request count. A 10,000-word prompt and a 100-word prompt are billed differently. Without understanding token math, you'll either overspend or hit rate limits with vague error messages.

Skip if: If you're building a prototype that runs once, or you're certain your messages are under 100 tokens, you can skip token counting. But the moment you're building production systems or iterating on prompts, you need this.

Explanation

What this does: The OpenAI Python SDK provides the tiktoken library integration to count tokens in text before sending it to the API. Tokens are how OpenAI measures and charges for API usage: roughly 1 token ≈ 4 characters in English, but varies by language and model.

How it works: When you create a message, the text is broken into tokens using a model-specific encoding. GPT-4 uses different tokenization than GPT-3.5. The OpenAI SDK counts tokens by loading the correct encoding for your model, then passing your text through it. The API returns usage.prompt_tokens and usage.completion_tokens in the response: but you should count before sending to budget accurately.

When to use it: Always count tokens before sending large batches of messages, before injecting user data into system prompts, or when building chatbots that accumulate conversation history. This prevents bill shock and helps you decide whether to truncate context or use a smaller model.

Request code

Illustrative only - not runnable without a valid API key

python

import tiktoken
from openai import OpenAI

client = OpenAI()

encoding = tiktoken.encoding_for_model('gpt-4')

message_text = "Explain quantum computing in 500 words. Include practical applications and limitations."
tokens = encoding.encode(message_text)
token_count = len(tokens)

print(f"Message: {message_text}")
print(f"Token count: {token_count}")
print(f"Estimated cost (GPT-4 input): ${token_count * 0.00003:.4f}")

if token_count > 8000:
    print("Warning: This message uses more than 8000 tokens. Consider truncating.")
else:
    response = client.chat.completions.create(
        model='gpt-4',
        messages=[
            {'role': 'user', 'content': message_text}
        ]
    )
    print(f"\nAPI Response:")
    print(f"Prompt tokens used: {response.usage.prompt_tokens}")
    print(f"Completion tokens used: {response.usage.completion_tokens}")
    print(f"Total tokens used: {response.usage.total_tokens}")
    print(f"\nAssistant: {response.choices[0].message.content[:200]}...")

Authentication

Set your API key as an environment variable before running: export OPENAI_API_KEY='sk-proj-your-actual-key-here' The OpenAI() client will read this automatically. No additional auth setup required beyond this.

Response shape

Field	Description
`usage`	[object Object]
`choices`	[object Object]

Field guide

prompt_tokens

This is what you pay for input. Multiply by the input price per 1M tokens (e.g., GPT-4 is $0.03 per 1M tokens) to calculate input cost.

completion_tokens

What the model generated. Always more expensive per token than input. GPT-4 output is $0.06 per 1M tokens vs $0.03 for input.

total_tokens

The hidden gotcha: this tells you if the API truncated your context window. If you sent 6000 tokens but total_tokens is way lower, your context was cut off. Check this every time.

Setup trap

The tiktoken library is not installed by default. If you get ModuleNotFoundError: No module named 'tiktoken', install it with pip install tiktoken. The OpenAI SDK includes it as an optional dependency.

Cost

GPT-4 as of April 2026 charges $0.03 per 1M input tokens and $0.06 per 1M output tokens. A message that uses 100,000 tokens costs $3 in input alone. If you're running this in a loop 1000 times, that's $3,000. Counting tokens before the loop and truncating at 10,000 tokens saves $2,700.

Rate limits

OpenAI enforces rate limits by tokens per minute, not requests per minute. With a standard tier account, you're limited to 90,000 tokens per minute. A single 50,000-token message followed by another hits your limit immediately. Count tokens and implement exponential backoff on 429 errors.

Common gotcha

Developers count tokens for the message text, but forget that the API also tokenizes the system prompt, previous conversation history, and JSON structure of your messages. A 500-token message becomes 650 tokens by the time it hits the billing meter because of the {'role': 'user', 'content': ...} wrapper and any system context. Always test with a real API call and check response.usage.prompt_tokens: it will be higher than your manual count.

Error recovery

ModuleNotFoundError: No module named 'tiktoken'

Install tiktoken: pip install tiktoken. The OpenAI SDK does not bundle it by default.

RateLimitError

You've sent too many tokens in too short a time. Check response.usage.total_tokens and add time.sleep() between requests, or batch smaller messages.

InvalidRequestError: This model's maximum context length is...

Your total message length (including history and system prompt) exceeds the model's context window. Count all tokens in the conversation, not just the latest message.

Experienced dev note

Senior developers automate token counting at message construction time. Build a helper function that wraps your message creation and logs token counts to a monitoring system. This catches runaway context windows before they hit production. Also: never trust user input. A user can paste 1M characters of text as a 'search query.' Always count, always have a maximum token budget, and always truncate before sending. The 3-second latency hit from token counting is worth the cost savings and stability.

Check your understanding

You're building a chatbot that keeps the last 10 messages as context. The 10 messages total 45,000 tokens. GPT-4's context window is 128,000 tokens. A user sends a new 800-token message. Will this request succeed? Why or why not?

Show answer hint

Don't just add 45000 + 800. You must count tokens for the entire request structure that goes to the API: the system prompt (if any), all 10 previous messages with role/content wrappers, the new message, and the JSON structure itself. Also, the API reserves tokens for its own output: check OpenAI's documentation on how much completion tokens you need to reserve.

VERSION tiktoken encoding for 'gpt-4' switched in late 2024 to 'gpt-4-turbo' and 'gpt-4o' encodings. Use encoding_for_model('gpt-4') for legacy models, or check OpenAI's docs for which encoding your specific model version requires. GPT-4-turbo and newer use the same 'cl100k_base' encoding, but pricing and token limits differ.

Community Notes

No notes yetBe the first to share a version-specific fix or tip.