How to intermediate · 3 min read

Azure OpenAI disaster recovery

Quick answer

Implement disaster recovery for Azure OpenAI by deploying your resources in multiple Azure regions and configuring failover mechanisms. Use retry logic and monitor service health to maintain availability during outages or regional failures.

PREREQUISITES

Python 3.8+
Azure OpenAI API key
pip install openai>=1.0
Azure subscription with multi-region setup

Setup multi-region Azure OpenAI

To ensure disaster recovery, deploy your Azure OpenAI resource in at least two Azure regions. This allows your application to switch to a secondary region if the primary region experiences downtime.

Configure environment variables for each region's endpoint and deployment name.

python

import os
from openai import AzureOpenAI

# Primary region client
primary_client = AzureOpenAI(
    api_key=os.environ["AZURE_OPENAI_API_KEY"],
    azure_endpoint=os.environ["AZURE_OPENAI_PRIMARY_ENDPOINT"],
    api_version="2024-02-01"
)

# Secondary region client
secondary_client = AzureOpenAI(
    api_key=os.environ["AZURE_OPENAI_API_KEY"],
    azure_endpoint=os.environ["AZURE_OPENAI_SECONDARY_ENDPOINT"],
    api_version="2024-02-01"
)

print("Clients for primary and secondary regions initialized.")

output

Clients for primary and secondary regions initialized.

Step by step disaster recovery code

This example demonstrates how to call the Azure OpenAI chat completion endpoint with automatic failover to a secondary region if the primary region fails.

python

import os
from openai import AzureOpenAI
from openai import OpenAIError

primary_client = AzureOpenAI(
    api_key=os.environ["AZURE_OPENAI_API_KEY"],
    azure_endpoint=os.environ["AZURE_OPENAI_PRIMARY_ENDPOINT"],
    api_version="2024-02-01"
)

secondary_client = AzureOpenAI(
    api_key=os.environ["AZURE_OPENAI_API_KEY"],
    azure_endpoint=os.environ["AZURE_OPENAI_SECONDARY_ENDPOINT"],
    api_version="2024-02-01"
)

primary_deployment = os.environ["AZURE_OPENAI_PRIMARY_DEPLOYMENT"]
secondary_deployment = os.environ["AZURE_OPENAI_SECONDARY_DEPLOYMENT"]

messages = [{"role": "user", "content": "Explain disaster recovery for Azure OpenAI."}]

try:
    response = primary_client.chat.completions.create(
        model=primary_deployment,
        messages=messages
    )
    print("Primary region response:")
    print(response.choices[0].message.content)
except OpenAIError as e:
    print(f"Primary region failed with error: {e}")
    print("Failing over to secondary region...")
    try:
        response = secondary_client.chat.completions.create(
            model=secondary_deployment,
            messages=messages
        )
        print("Secondary region response:")
        print(response.choices[0].message.content)
    except OpenAIError as e2:
        print(f"Secondary region also failed: {e2}")

output

Primary region response:
Azure OpenAI disaster recovery involves deploying resources in multiple regions and implementing failover strategies to maintain service availability.

Common variations

Use asynchronous calls with asyncio and AzureOpenAI for non-blocking failover.
Implement exponential backoff retry logic for transient errors before failing over.
Use different Azure OpenAI models or deployment names per region as needed.

python

import asyncio
import os
from openai import AzureOpenAI
from openai import OpenAIError

async def call_with_failover():
    primary_client = AzureOpenAI(
        api_key=os.environ["AZURE_OPENAI_API_KEY"],
        azure_endpoint=os.environ["AZURE_OPENAI_PRIMARY_ENDPOINT"],
        api_version="2024-02-01"
    )
    secondary_client = AzureOpenAI(
        api_key=os.environ["AZURE_OPENAI_API_KEY"],
        azure_endpoint=os.environ["AZURE_OPENAI_SECONDARY_ENDPOINT"],
        api_version="2024-02-01"
    )

    primary_deployment = os.environ["AZURE_OPENAI_PRIMARY_DEPLOYMENT"]
    secondary_deployment = os.environ["AZURE_OPENAI_SECONDARY_DEPLOYMENT"]

    messages = [{"role": "user", "content": "Explain disaster recovery for Azure OpenAI."}]

    try:
        response = await primary_client.chat.completions.acreate(
            model=primary_deployment,
            messages=messages
        )
        print("Primary region async response:")
        print(response.choices[0].message.content)
    except OpenAIError as e:
        print(f"Primary region async failed: {e}")
        print("Failing over to secondary region async...")
        try:
            response = await secondary_client.chat.completions.acreate(
                model=secondary_deployment,
                messages=messages
            )
            print("Secondary region async response:")
            print(response.choices[0].message.content)
        except OpenAIError as e2:
            print(f"Secondary region async also failed: {e2}")

asyncio.run(call_with_failover())

output

Primary region async response:
Azure OpenAI disaster recovery involves deploying resources in multiple regions and implementing failover strategies to maintain service availability.

Troubleshooting

If you receive ConnectionError or TimeoutError, verify your network and Azure region status.
Check that environment variables for endpoints and deployments are correctly set.
Use Azure Service Health dashboard to monitor regional outages.
Implement logging to capture failover events for audit and debugging.

Key Takeaways

Deploy Azure OpenAI resources in multiple regions for high availability.
Use retry and failover logic in your client code to handle regional outages.
Monitor Azure service health and logs to detect and respond to failures quickly.

Verified 2026-04 · gpt-4o, gpt-4o-mini

Verify ↗

Community Notes

No notes yetBe the first to share a version-specific fix or tip.