How to set max tokens in OpenAI API
Quick answer
Use the
max_tokens parameter in the client.chat.completions.create() method to limit the maximum number of tokens in the response. Set max_tokens to an integer value representing the token limit when calling the API.PREREQUISITES
Python 3.8+OpenAI API key (free tier works)pip install openai>=1.0
Setup
Install the official OpenAI Python SDK and set your API key as an environment variable.
pip install openai>=1.0 Step by step
This example shows how to set max_tokens to limit the response length when creating a chat completion with the gpt-4o model.
import os
from openai import OpenAI
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Explain the theory of relativity."}],
max_tokens=100 # Limit response to 100 tokens
)
print(response.choices[0].message.content) output
The theory of relativity, developed by Albert Einstein, consists of two parts: special relativity and general relativity. Special relativity addresses the physics of objects moving at constant speeds, especially near the speed of light, introducing concepts like time dilation and length contraction. General relativity extends this to include gravity as the curvature of spacetime caused by mass and energy.
Common variations
- Use different models like
gpt-4o-miniorgpt-4oby changing themodelparameter. - For asynchronous calls, use async client methods with
asyncio. - Streaming responses do not support
max_tokensdirectly but can be combined with token limits client-side.
import asyncio
import os
from openai import OpenAI
async def main():
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
response = await client.chat.completions.acreate(
model="gpt-4o",
messages=[{"role": "user", "content": "Summarize quantum computing."}],
max_tokens=50
)
print(response.choices[0].message.content)
asyncio.run(main()) output
Quantum computing uses quantum bits or qubits, which can represent multiple states simultaneously, enabling complex computations much faster than classical computers for certain problems.
Troubleshooting
- If you receive an error about token limits, ensure your
max_tokensplus prompt tokens do not exceed the model's maximum context length. - Setting
max_tokenstoo low may truncate responses prematurely; increase it if output is incomplete. - Check your environment variable
OPENAI_API_KEYis set correctly to avoid authentication errors.
Key Takeaways
- Set
max_tokensinclient.chat.completions.create()to control response length. - Ensure total tokens (prompt + max_tokens) stay within model limits to avoid errors.
- Use environment variables for API keys to keep credentials secure.
- Async calls support
max_tokenssimilarly to sync calls. - Adjust
max_tokensbased on desired response detail and length.