BadRequestError
openai.BadRequestError (HTTP 400: max_completion_tokens too low for reasoning)
Stack trace
openai.BadRequestError: Error code: 400 - {'error': {'message': 'max_completion_tokens must be at least 1024 for o1 reasoning. Requested: 256', 'type': 'invalid_request_error', 'param': 'max_completion_tokens', 'code': 'invalid_parameter_value'}} Why it happens
OpenAI's o1 and o3 models allocate internal tokens for reasoning (thinking) before generating the final response. The max_completion_tokens parameter caps the TOTAL token budget (reasoning + output). If set below the model's minimum threshold, the reasoning chain cannot complete, and the API rejects the request with a 400 error. Unlike standard models where max_tokens=256 is valid, reasoning models need 1024+ tokens minimum to perform meaningful thinking.
Detection
Check your OpenAI API logs for 400 errors mentioning 'max_completion_tokens must be at least'. Monitor client-side by catching BadRequestError and checking if the error message contains 'max_completion_tokens'. Add a pre-flight validation: if using o1/o3, assert max_completion_tokens >= 1024 before sending the request.
Causes & fixes
Using max_completion_tokens < 1024 with o1/o3 models (copying pattern from gpt-4o where 256 works fine)
Set max_completion_tokens to at least 1024 for o1/o3. Start with 2048 for complex reasoning tasks. Example: max_completion_tokens=2048
Reusing legacy max_tokens parameter instead of max_completion_tokens for reasoning models
Replace max_tokens with max_completion_tokens. o1/o3 do NOT accept max_tokens. Use: max_completion_tokens=min(8192, expected_reasoning_tokens + expected_output_tokens)
Dynamically calculating max_completion_tokens based on input length without accounting for reasoning overhead
Reserve at least 50% of the token budget for internal reasoning. For a 16k context limit, allocate: max_completion_tokens = min(16000, len(prompt_tokens) * 2 + 1024) to leave breathing room for thinking
Model selection in code defaults to gpt-4o parameters which have lower minimums, then switched to o1 without updating token settings
Create model-specific config: if model in ['o1', 'o3-mini']: min_tokens = 1024; elif model in ['gpt-4o', 'gpt-4o-mini']: min_tokens = 256. Always set max_completion_tokens >= min_tokens for the selected model
Code: broken vs fixed
import os
from openai import OpenAI
client = OpenAI(api_key=os.environ.get('OPENAI_API_KEY'))
response = client.chat.completions.create(
model='o1',
messages=[{'role': 'user', 'content': 'Prove that sqrt(2) is irrational'}],
max_completion_tokens=256 # ❌ TOO LOW for o1 reasoning — will fail with 400 error
)
print(response.choices[0].message.content) import os
from openai import OpenAI, BadRequestError
client = OpenAI(api_key=os.environ.get('OPENAI_API_KEY'))
try:
response = client.chat.completions.create(
model='o1',
messages=[{'role': 'user', 'content': 'Prove that sqrt(2) is irrational'}],
max_completion_tokens=2048 # ✅ FIXED: o1 requires minimum 1024, 2048 is safe for reasoning
)
print('Reasoning output:', response.choices[0].message.content)
except BadRequestError as e:
if 'max_completion_tokens' in str(e):
print(f'Token limit error: {e}. Set max_completion_tokens >= 1024 for o1/o3 models.')
else:
raise Workaround
If you cannot immediately increase max_completion_tokens due to rate limit concerns, split the reasoning task into smaller sub-problems with separate API calls to o1 with reduced scope, each using max_completion_tokens=1024 minimum. Aggregate results client-side. Note: this is slower and costlier than one larger request with proper token allocation.
Prevention
Create a model config wrapper that enforces minimum token budgets per model type before API calls. Example: {'o1': {'min_max_completion_tokens': 1024}, 'o3-mini': {'min_max_completion_tokens': 1024}, 'gpt-4o': {'min_max_completion_tokens': 256}}. Validate every request: assert max_completion_tokens >= config[model]['min_max_completion_tokens']. This pattern prevents accidental token limit errors during model swaps.