High severity HTTP 400 beginner · Fix: 2-5 min

BadRequestError

openai.BadRequestError (HTTP 400: max_completion_tokens too low for reasoning)

What this error means
OpenAI o1/o3 reasoning models require a minimum max_completion_tokens value to complete internal reasoning chains; setting it too low truncates thinking and returns incomplete responses.

Stack trace

traceback
openai.BadRequestError: Error code: 400 - {'error': {'message': 'max_completion_tokens must be at least 1024 for o1 reasoning. Requested: 256', 'type': 'invalid_request_error', 'param': 'max_completion_tokens', 'code': 'invalid_parameter_value'}}
QUICK FIX
Change max_completion_tokens=256 to max_completion_tokens=2048 when calling o1/o3 models; 1024 is the minimum, 2048+ is recommended for complex reasoning.

Why it happens

OpenAI's o1 and o3 models allocate internal tokens for reasoning (thinking) before generating the final response. The max_completion_tokens parameter caps the TOTAL token budget (reasoning + output). If set below the model's minimum threshold, the reasoning chain cannot complete, and the API rejects the request with a 400 error. Unlike standard models where max_tokens=256 is valid, reasoning models need 1024+ tokens minimum to perform meaningful thinking.

Detection

Check your OpenAI API logs for 400 errors mentioning 'max_completion_tokens must be at least'. Monitor client-side by catching BadRequestError and checking if the error message contains 'max_completion_tokens'. Add a pre-flight validation: if using o1/o3, assert max_completion_tokens >= 1024 before sending the request.

Causes & fixes

1

Using max_completion_tokens < 1024 with o1/o3 models (copying pattern from gpt-4o where 256 works fine)

✓ Fix

Set max_completion_tokens to at least 1024 for o1/o3. Start with 2048 for complex reasoning tasks. Example: max_completion_tokens=2048

2

Reusing legacy max_tokens parameter instead of max_completion_tokens for reasoning models

✓ Fix

Replace max_tokens with max_completion_tokens. o1/o3 do NOT accept max_tokens. Use: max_completion_tokens=min(8192, expected_reasoning_tokens + expected_output_tokens)

3

Dynamically calculating max_completion_tokens based on input length without accounting for reasoning overhead

✓ Fix

Reserve at least 50% of the token budget for internal reasoning. For a 16k context limit, allocate: max_completion_tokens = min(16000, len(prompt_tokens) * 2 + 1024) to leave breathing room for thinking

4

Model selection in code defaults to gpt-4o parameters which have lower minimums, then switched to o1 without updating token settings

✓ Fix

Create model-specific config: if model in ['o1', 'o3-mini']: min_tokens = 1024; elif model in ['gpt-4o', 'gpt-4o-mini']: min_tokens = 256. Always set max_completion_tokens >= min_tokens for the selected model

Code: broken vs fixed

Broken - triggers the error
python
import os
from openai import OpenAI

client = OpenAI(api_key=os.environ.get('OPENAI_API_KEY'))

response = client.chat.completions.create(
    model='o1',
    messages=[{'role': 'user', 'content': 'Prove that sqrt(2) is irrational'}],
    max_completion_tokens=256  # ❌ TOO LOW for o1 reasoning — will fail with 400 error
)
print(response.choices[0].message.content)
Fixed - works correctly
python
import os
from openai import OpenAI, BadRequestError

client = OpenAI(api_key=os.environ.get('OPENAI_API_KEY'))

try:
    response = client.chat.completions.create(
        model='o1',
        messages=[{'role': 'user', 'content': 'Prove that sqrt(2) is irrational'}],
        max_completion_tokens=2048  # ✅ FIXED: o1 requires minimum 1024, 2048 is safe for reasoning
    )
    print('Reasoning output:', response.choices[0].message.content)
except BadRequestError as e:
    if 'max_completion_tokens' in str(e):
        print(f'Token limit error: {e}. Set max_completion_tokens >= 1024 for o1/o3 models.')
    else:
        raise
Changed max_completion_tokens from 256 (appropriate for gpt-4o) to 2048 (minimum safe for o1 reasoning), and added BadRequestError handling to catch token limit violations with a helpful message.

Workaround

If you cannot immediately increase max_completion_tokens due to rate limit concerns, split the reasoning task into smaller sub-problems with separate API calls to o1 with reduced scope, each using max_completion_tokens=1024 minimum. Aggregate results client-side. Note: this is slower and costlier than one larger request with proper token allocation.

Prevention

Create a model config wrapper that enforces minimum token budgets per model type before API calls. Example: {'o1': {'min_max_completion_tokens': 1024}, 'o3-mini': {'min_max_completion_tokens': 1024}, 'gpt-4o': {'min_max_completion_tokens': 256}}. Validate every request: assert max_completion_tokens >= config[model]['min_max_completion_tokens']. This pattern prevents accidental token limit errors during model swaps.

Python 3.9+ · openai >=1.3.0 · tested on 1.45.x (April 2026)
Verified 2026-04 · o1, o3-mini, gpt-4o
Verify ↗

Community Notes

No notes yetBe the first to share a version-specific fix or tip.