AzureOpenAIError
azure.ai.openai.AzureOpenAIError
Stack trace
azure.ai.openai.AzureOpenAIError: The request was rejected because the total tokens in the prompt and completion exceed the model's maximum context length.
at azure.ai.openai._client._client._raise_if_error (azure/ai/openai/_client.py:123)
at azure.ai.openai._client._client._send_request (azure/ai/openai/_client.py:98)
at azure.ai.openai._client.OpenAIClient.chat_completions.create (azure/ai/openai/_client.py:45)
at main.py:42 Why it happens
Azure OpenAI models have a fixed maximum context length that limits the total tokens in the prompt plus the expected completion. When your input prompt plus the requested completion length exceed this limit, the deployment rejects the request with this error.
Detection
Monitor API responses for AzureOpenAIError with messages about context length exceeded. Log prompt and max tokens requested to identify when limits are breached.
Causes & fixes
Prompt plus max_tokens parameter exceeds the model's maximum context length.
Reduce the prompt length or lower the max_tokens parameter to ensure total tokens fit within the model's context window.
Using a deployment with a smaller context length than expected (e.g., 2048 tokens instead of 8192).
Verify the deployment's model context length in Azure portal and adjust prompt size or max_tokens accordingly.
Not accounting for token overhead from system or user messages in chat completions.
Include all messages and system prompts in token count estimation before sending the request.
Code: broken vs fixed
import os
from azure.ai.openai import OpenAIClient
client = OpenAIClient(os.environ["AZURE_OPENAI_ENDPOINT"], credential=os.environ["AZURE_OPENAI_KEY"])
response = client.chat_completions.create(
deployment_id="my-deployment",
messages=[{"role": "user", "content": "Very long prompt text..."}],
max_tokens=1000 # This line causes context length exceeded error
)
print(response.choices[0].message.content) import os
from azure.ai.openai import OpenAIClient
client = OpenAIClient(os.environ["AZURE_OPENAI_ENDPOINT"], credential=os.environ["AZURE_OPENAI_KEY"])
# Reduced prompt length or max_tokens to fit context length
response = client.chat_completions.create(
deployment_id="my-deployment",
messages=[{"role": "user", "content": "Shorter prompt text..."}],
max_tokens=500 # Reduced max_tokens to avoid exceeding context length
)
print(response.choices[0].message.content) # Fixed context length exceeded error Workaround
Catch AzureOpenAIError exceptions, then truncate or summarize the prompt dynamically before retrying the request to stay within token limits.
Prevention
Implement token counting on all input messages and max_tokens before sending requests, and configure alerts for approaching context length limits to avoid runtime failures.