How to beginner · 3 min read

How to reduce token usage in OpenAI API calls

Q: How to reduce token usage in OpenAI API calls

To reduce token usage in OpenAI API calls, minimize prompt length by removing unnecessary text and use concise instructions. Also, limit max_tokens in completions and reuse context efficiently to avoid redundant tokens.

Quick answer

To reduce token usage in OpenAI API calls, minimize prompt length by removing unnecessary text and use concise instructions. Also, limit max_tokens in completions and reuse context efficiently to avoid redundant tokens.

PREREQUISITES

Python 3.8+
OpenAI API key (free tier works)
pip install openai>=1.0

Setup

Install the official openai Python SDK and set your API key as an environment variable.

bash

pip install openai>=1.0

Step by step

This example shows how to reduce token usage by trimming the prompt and limiting max_tokens. It uses the gpt-4o model with the OpenAI SDK v1 pattern.

python

import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# Original verbose prompt
long_prompt = """You are a helpful assistant. Please provide a detailed explanation about the benefits of token optimization in API calls. """

# Reduced prompt by removing unnecessary words
short_prompt = "Explain benefits of token optimization in API calls."

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": short_prompt}],
    max_tokens=100  # Limit max tokens to reduce output length
)

print(response.choices[0].message.content)

output

Token optimization reduces API costs and improves response speed by minimizing unnecessary data sent and received.

Common variations

You can also reduce tokens by using shorter system instructions, reusing conversation history selectively, or switching to smaller models like gpt-4o-mini for less complex tasks.

python

import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# Using a smaller model and concise system message
response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "system", "content": "Be brief."},
        {"role": "user", "content": "Summarize token usage reduction."}
    ],
    max_tokens=50
)

print(response.choices[0].message.content)

output

Token usage reduction saves cost by limiting prompt and response length.

Troubleshooting

If you see unexpectedly high token usage, check for redundant or verbose prompts and conversation history. Use token counting tools to debug and trim inputs accordingly.

python

from openai import OpenAI
import os

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# Example to count tokens using tiktoken (install via pip install tiktoken)
import tiktoken

def count_tokens(text, model_name="gpt-4o"):
    enc = tiktoken.encoding_for_model(model_name)
    return len(enc.encode(text))

prompt = "Explain token usage optimization."
print(f"Token count: {count_tokens(prompt)}")

output

Token count: 5

✅

Key Takeaways

Trim prompts to essential information to reduce token count.
Limit max_tokens to control output length and cost.
Reuse context selectively to avoid repeating tokens.
Use smaller models for simpler tasks to save tokens.
Count tokens before sending requests to optimize usage.

Verified 2026-04 · gpt-4o, gpt-4o-mini

Verify ↗