High severity HTTP 400 intermediate · Fix: 5-10 min

AzureOpenAIError

azure.ai.openai.AzureOpenAIError

What this error means

Azure OpenAI deployment rejected the request because the combined prompt and completion tokens exceeded the model's maximum context length.

Stack trace

traceback

azure.ai.openai.AzureOpenAIError: The request was rejected because the total tokens in the prompt and completion exceed the model's maximum context length.
    at azure.ai.openai._client._client._raise_if_error (azure/ai/openai/_client.py:123)
    at azure.ai.openai._client._client._send_request (azure/ai/openai/_client.py:98)
    at azure.ai.openai._client.OpenAIClient.chat_completions.create (azure/ai/openai/_client.py:45)
    at main.py:42

QUICK FIX

Reduce prompt size or max_tokens so their sum does not exceed the deployment's max context length.

Why it happens

Azure OpenAI models have a fixed maximum context length that limits the total tokens in the prompt plus the expected completion. When your input prompt plus the requested completion length exceed this limit, the deployment rejects the request with this error.

Detection

Monitor API responses for AzureOpenAIError with messages about context length exceeded. Log prompt and max tokens requested to identify when limits are breached.

Causes & fixes

Prompt plus max_tokens parameter exceeds the model's maximum context length.

✓ Fix

Reduce the prompt length or lower the max_tokens parameter to ensure total tokens fit within the model's context window.

Using a deployment with a smaller context length than expected (e.g., 2048 tokens instead of 8192).

✓ Fix

Verify the deployment's model context length in Azure portal and adjust prompt size or max_tokens accordingly.

Not accounting for token overhead from system or user messages in chat completions.

✓ Fix

Include all messages and system prompts in token count estimation before sending the request.

Code: broken vs fixed

Broken - triggers the error

python

import os
from azure.ai.openai import OpenAIClient

client = OpenAIClient(os.environ["AZURE_OPENAI_ENDPOINT"], credential=os.environ["AZURE_OPENAI_KEY"])

response = client.chat_completions.create(
    deployment_id="my-deployment",
    messages=[{"role": "user", "content": "Very long prompt text..."}],
    max_tokens=1000  # This line causes context length exceeded error
)
print(response.choices[0].message.content)

Fixed - works correctly

python

import os
from azure.ai.openai import OpenAIClient

client = OpenAIClient(os.environ["AZURE_OPENAI_ENDPOINT"], credential=os.environ["AZURE_OPENAI_KEY"])

# Reduced prompt length or max_tokens to fit context length
response = client.chat_completions.create(
    deployment_id="my-deployment",
    messages=[{"role": "user", "content": "Shorter prompt text..."}],
    max_tokens=500  # Reduced max_tokens to avoid exceeding context length
)
print(response.choices[0].message.content)  # Fixed context length exceeded error

Reduced prompt length and max_tokens to ensure total tokens fit within the deployment's maximum context length, preventing the error.

⚠

Workaround

Catch AzureOpenAIError exceptions, then truncate or summarize the prompt dynamically before retrying the request to stay within token limits.

✓

Prevention

Implement token counting on all input messages and max_tokens before sending requests, and configure alerts for approaching context length limits to avoid runtime failures.

Python 3.9+ · azure-ai-openai >=1.0.0 · tested on 1.1.x

Verified 2026-04

Verify ↗

Community Notes

No notes yetBe the first to share a version-specific fix or tip.