QuotaExceededDeploymentCapacity
azure.core.exceptions.HttpResponseError: QuotaExceededDeploymentCapacity
Stack trace
azure.core.exceptions.HttpResponseError: (429) QuotaExceededDeploymentCapacity: The deployment capacity for your Azure OpenAI resource has been exceeded. Please reduce usage or request a quota increase.
at azure.ai.openai._client._client._raise_for_status(response)
at azure.ai.openai._client._client._send_request(...)
at azure.ai.openai.OpenAIClient.get_chat_completions(...)
... Why it happens
Azure OpenAI enforces strict deployment capacity limits per subscription and region to manage resource allocation. When your usage exceeds these limits, the service returns this error to prevent overconsumption. This can happen during high traffic or if your quota has not been increased after initial provisioning.
Detection
Monitor Azure OpenAI usage metrics and quota limits via the Azure Portal or Azure CLI. Set alerts on quota usage approaching capacity to catch this error before it impacts production.
Causes & fixes
Your Azure OpenAI resource deployment capacity quota is fully consumed by active requests or concurrent deployments.
Reduce concurrent requests or scale down active deployments. Alternatively, request a quota increase from Azure support for your subscription and region.
Multiple applications or services share the same Azure OpenAI resource, collectively exceeding the deployment capacity.
Isolate workloads by creating separate Azure OpenAI resources per application or coordinate usage to stay within quota limits.
Your subscription is new or default quota limits are low, insufficient for your workload demands.
Submit a quota increase request through the Azure Portal under the 'Help + support' > 'New support request' > 'Quota' section.
Code: broken vs fixed
from azure.ai.openai import OpenAIClient
import os
client = OpenAIClient(os.environ['AZURE_OPENAI_ENDPOINT'], credential=os.environ['AZURE_OPENAI_KEY'])
response = client.get_chat_completions(deployment_id='my-deployment', messages=[{'role': 'user', 'content': 'Hello'}]) # This line triggers quota exceeded error from azure.ai.openai import OpenAIClient
import os
import azure.core.exceptions
client = OpenAIClient(os.environ['AZURE_OPENAI_ENDPOINT'], credential=os.environ['AZURE_OPENAI_KEY'])
try:
response = client.get_chat_completions(deployment_id='my-deployment', messages=[{'role': 'user', 'content': 'Hello'}])
print(response.choices[0].message.content)
except azure.core.exceptions.HttpResponseError as e:
if 'QuotaExceededDeploymentCapacity' in str(e):
print('Quota exceeded: reduce usage or request quota increase.')
else:
raise Workaround
Catch the HttpResponseError exception, detect the quota exceeded message, and implement exponential backoff retries or degrade service features temporarily until capacity frees up.
Prevention
Architect your system to monitor Azure OpenAI quota usage proactively and request quota increases before hitting limits. Use separate resources for high-demand workloads to avoid shared capacity exhaustion.