Azure OpenAI disaster recovery
Quick answer
Implement disaster recovery for Azure OpenAI by deploying your resources in multiple Azure regions and configuring failover mechanisms. Use retry logic and monitor service health to maintain availability during outages or regional failures.
PREREQUISITES
Python 3.8+Azure OpenAI API keypip install openai>=1.0Azure subscription with multi-region setup
Setup multi-region Azure OpenAI
To ensure disaster recovery, deploy your Azure OpenAI resource in at least two Azure regions. This allows your application to switch to a secondary region if the primary region experiences downtime.
Configure environment variables for each region's endpoint and deployment name.
import os
from openai import AzureOpenAI
# Primary region client
primary_client = AzureOpenAI(
api_key=os.environ["AZURE_OPENAI_API_KEY"],
azure_endpoint=os.environ["AZURE_OPENAI_PRIMARY_ENDPOINT"],
api_version="2024-02-01"
)
# Secondary region client
secondary_client = AzureOpenAI(
api_key=os.environ["AZURE_OPENAI_API_KEY"],
azure_endpoint=os.environ["AZURE_OPENAI_SECONDARY_ENDPOINT"],
api_version="2024-02-01"
)
print("Clients for primary and secondary regions initialized.") output
Clients for primary and secondary regions initialized.
Step by step disaster recovery code
This example demonstrates how to call the Azure OpenAI chat completion endpoint with automatic failover to a secondary region if the primary region fails.
import os
from openai import AzureOpenAI
from openai import OpenAIError
primary_client = AzureOpenAI(
api_key=os.environ["AZURE_OPENAI_API_KEY"],
azure_endpoint=os.environ["AZURE_OPENAI_PRIMARY_ENDPOINT"],
api_version="2024-02-01"
)
secondary_client = AzureOpenAI(
api_key=os.environ["AZURE_OPENAI_API_KEY"],
azure_endpoint=os.environ["AZURE_OPENAI_SECONDARY_ENDPOINT"],
api_version="2024-02-01"
)
primary_deployment = os.environ["AZURE_OPENAI_PRIMARY_DEPLOYMENT"]
secondary_deployment = os.environ["AZURE_OPENAI_SECONDARY_DEPLOYMENT"]
messages = [{"role": "user", "content": "Explain disaster recovery for Azure OpenAI."}]
try:
response = primary_client.chat.completions.create(
model=primary_deployment,
messages=messages
)
print("Primary region response:")
print(response.choices[0].message.content)
except OpenAIError as e:
print(f"Primary region failed with error: {e}")
print("Failing over to secondary region...")
try:
response = secondary_client.chat.completions.create(
model=secondary_deployment,
messages=messages
)
print("Secondary region response:")
print(response.choices[0].message.content)
except OpenAIError as e2:
print(f"Secondary region also failed: {e2}") output
Primary region response: Azure OpenAI disaster recovery involves deploying resources in multiple regions and implementing failover strategies to maintain service availability.
Common variations
- Use asynchronous calls with
asyncioandAzureOpenAIfor non-blocking failover. - Implement exponential backoff retry logic for transient errors before failing over.
- Use different Azure OpenAI models or deployment names per region as needed.
import asyncio
import os
from openai import AzureOpenAI
from openai import OpenAIError
async def call_with_failover():
primary_client = AzureOpenAI(
api_key=os.environ["AZURE_OPENAI_API_KEY"],
azure_endpoint=os.environ["AZURE_OPENAI_PRIMARY_ENDPOINT"],
api_version="2024-02-01"
)
secondary_client = AzureOpenAI(
api_key=os.environ["AZURE_OPENAI_API_KEY"],
azure_endpoint=os.environ["AZURE_OPENAI_SECONDARY_ENDPOINT"],
api_version="2024-02-01"
)
primary_deployment = os.environ["AZURE_OPENAI_PRIMARY_DEPLOYMENT"]
secondary_deployment = os.environ["AZURE_OPENAI_SECONDARY_DEPLOYMENT"]
messages = [{"role": "user", "content": "Explain disaster recovery for Azure OpenAI."}]
try:
response = await primary_client.chat.completions.acreate(
model=primary_deployment,
messages=messages
)
print("Primary region async response:")
print(response.choices[0].message.content)
except OpenAIError as e:
print(f"Primary region async failed: {e}")
print("Failing over to secondary region async...")
try:
response = await secondary_client.chat.completions.acreate(
model=secondary_deployment,
messages=messages
)
print("Secondary region async response:")
print(response.choices[0].message.content)
except OpenAIError as e2:
print(f"Secondary region async also failed: {e2}")
asyncio.run(call_with_failover()) output
Primary region async response: Azure OpenAI disaster recovery involves deploying resources in multiple regions and implementing failover strategies to maintain service availability.
Troubleshooting
- If you receive
ConnectionErrororTimeoutError, verify your network and Azure region status. - Check that environment variables for endpoints and deployments are correctly set.
- Use Azure Service Health dashboard to monitor regional outages.
- Implement logging to capture failover events for audit and debugging.
Key Takeaways
- Deploy Azure OpenAI resources in multiple regions for high availability.
- Use retry and failover logic in your client code to handle regional outages.
- Monitor Azure service health and logs to detect and respond to failures quickly.