How to Intermediate · 3 min read

Azure OpenAI multi-region deployment

Quick answer
To deploy Azure OpenAI across multiple regions, create separate resource instances in each desired Azure region and configure your application to route requests based on user location or failover needs. Use Azure's Traffic Manager or Front Door to manage multi-region traffic and ensure high availability.

PREREQUISITES

  • Azure subscription with Azure OpenAI access
  • Azure CLI installed and configured
  • Python 3.8+
  • pip install azure-identity azure-ai-openai

Setup

Install the Azure SDK packages and set environment variables for authentication.

  • Install Azure CLI and login: az login
  • Install Python packages: pip install azure-identity azure-ai-openai
  • Set environment variables for AZURE_CLIENT_ID, AZURE_TENANT_ID, and AZURE_CLIENT_SECRET for service principal authentication.
bash
pip install azure-identity azure-ai-openai
output
Collecting azure-identity
Collecting azure-ai-openai
Successfully installed azure-ai-openai-1.x azure-identity-1.x

Step by step

Create Azure OpenAI resource instances in multiple regions via Azure Portal or CLI. Then use the Azure SDK to connect to each region's endpoint and route requests accordingly.

python
import os
from azure.identity import DefaultAzureCredential
from azure.ai.openai import OpenAIClient

# Define endpoints for multiple regions
endpoints = {
    "eastus": os.environ["AZURE_OPENAI_ENDPOINT_EASTUS"],
    "westus": os.environ["AZURE_OPENAI_ENDPOINT_WESTUS"]
}

# Authenticate once
credential = DefaultAzureCredential()

# Create clients for each region
clients = {region: OpenAIClient(endpoint, credential) for region, endpoint in endpoints.items()}

# Example function to send prompt to a specific region

def query_region(region: str, prompt: str) -> str:
    client = clients[region]
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": prompt}]
    )
    return response.choices[0].message.content

# Usage example
user_region = "eastus"  # This could be determined dynamically
result = query_region(user_region, "Explain multi-region deployment in Azure OpenAI.")
print(result)
output
Azure OpenAI multi-region deployment involves creating separate resource instances in each region and routing requests to the nearest or healthiest endpoint to ensure low latency and high availability.

Common variations

You can implement asynchronous calls using asyncio and azure-ai-openai async client. For traffic management, use Azure Traffic Manager or Front Door to automatically route user requests to the closest region.

Switch models by changing the model parameter in the chat.completions.create call.

python
import asyncio
import os
from azure.identity.aio import DefaultAzureCredential
from azure.ai.openai.aio import OpenAIClient

async def async_query_region(region: str, prompt: str):
    endpoint = os.environ[f"AZURE_OPENAI_ENDPOINT_{region.upper()}"]
    credential = DefaultAzureCredential()
    async with OpenAIClient(endpoint, credential) as client:
        response = await client.chat.completions.create(
            model="gpt-4o-mini",
            messages=[{"role": "user", "content": prompt}]
        )
        return response.choices[0].message.content

async def main():
    result = await async_query_region("eastus", "Async multi-region Azure OpenAI example.")
    print(result)

asyncio.run(main())
output
Async multi-region Azure OpenAI example allows scalable, low-latency responses by leveraging multiple regional endpoints.

Troubleshooting

  • If authentication fails, verify your service principal credentials and environment variables.
  • If you get endpoint not found errors, confirm the Azure OpenAI resource exists in the specified region and the endpoint URL is correct.
  • For latency issues, ensure your traffic manager or front door is configured properly to route users to the nearest region.

Key Takeaways

  • Create separate Azure OpenAI resources in each target region for multi-region deployment.
  • Use Azure SDK clients with region-specific endpoints and credentials to route requests.
  • Leverage Azure Traffic Manager or Front Door for automatic multi-region traffic routing.
  • Implement async calls for scalable, low-latency multi-region usage.
  • Verify authentication and endpoint URLs to avoid common deployment errors.
Verified 2026-04 · gpt-4o-mini
Verify ↗