Structured Course

Azure Openai

From first install to production patterns. Every lesson is standalone — jump to what you need, or work through from beginner to advanced.

147 lessons 3 levels Beginner → Advanced

Beginner

49 lessons · 7 chapters

See all →

What Azure OpenAI Is and Why Enterprises Use It 7

Setup and Authentication 7

Making API Calls 7

+4 more chapters

Start Beginner →

Intermediate

49 lessons · 7 chapters

See all →

GPT-4.1 and Latest Models on Azure 7

Provisioned Throughput Units 7

Batch API on Azure 7

+4 more chapters

Start Intermediate →

Advanced

49 lessons · 7 chapters

See all →

Enterprise Azure OpenAI Architecture 7

Azure OpenAI Compliance 7

LangChain and LlamaIndex on Azure 7

+4 more chapters

Start Advanced →

Full Course Contents

Beginner

49 lessons

1 What Azure OpenAI Is and Why Enterprises Use It 7

Azure OpenAI vs OpenAI direct: compliance difference Azure OpenAI routes requests through Microsoft's data centers and complies with enterprise regulatory frameworks, while OpenAI direct accesses OpenAI's infrastructure.

Data processing on Azure infrastructure Send text to Azure OpenAI for processing and receive structured responses using the AzureOpenAI client with proper authentication.

Enterprise agreements and BAA Azure OpenAI deployments created under Enterprise Agreements and Business Associate Agreements enable HIPAA compliance and data residency guarantees required by regulated industries.

Available models: GPT-4.1, o1, o3, DALL-E Learn which AI models are available in your Azure OpenAI deployment and how to request them by name.

When Azure OpenAI is Required Azure OpenAI is required when your organization needs enterprise compliance, Virtual Networks, managed identity authentication, or committed pricing instead of OpenAI's public API.

Azure AI Foundry: the new unified portal Azure AI Foundry is the unified portal where you deploy models, manage deployments, and get the credentials needed to connect your Python code to Azure OpenAI.

Azure OpenAI vs Azure AI Services Azure OpenAI provides GPT models via OpenAI's API on Azure infrastructure, while Azure AI Services is a broader suite of pre-built AI capabilities managed directly by Microsoft.

2 Setup and Authentication 7

Azure subscription and resource creation Set up an Azure subscription and create an OpenAI resource to obtain the API endpoint and key required for authentication.

Azure AI Foundry: deployment creation Create a model deployment in Azure AI Foundry that your Python code will connect to via the AzureOpenAI client.

Deployment name vs model name distinction In Azure OpenAI, you specify a deployment name (not a model name) when making API calls, because Azure manages multiple independent deployments of the same model.

AZURE_OPENAI_API_KEY and endpoint Set up Azure OpenAI authentication by configuring your API key and endpoint URL before making any API calls.

AzureOpenAI() client setup Initialize the AzureOpenAI client with endpoint, API key, and API version to authenticate requests to Azure OpenAI deployments.

api_version parameter: critical The <code>api_version</code> parameter pins which Azure OpenAI API schema you're using: getting it wrong causes silent field mismatches or cryptic 404 errors.

Verifying your first Azure OpenAI API call Make your first successful API call to Azure OpenAI and verify authentication and response handling work correctly.

3 Making API Calls 7

AzureOpenAI() client AzureOpenAI() creates an authenticated client to call your Azure OpenAI deployment via the OpenAI Python SDK.

azure_endpoint parameter The <code>azure_endpoint</code> parameter tells the Azure OpenAI client where your Azure resource is hosted, replacing the model selection logic you'd use with OpenAI's public API.

azure_deployment: deployment name The <code>model</code> parameter in AzureOpenAI requests must match your Azure deployment name, not the underlying model name.

Chat completions call Send a message to Azure OpenAI and get a model response using the chat completions endpoint.

Response format: same as OpenAI Azure OpenAI returns the exact same response structure as OpenAI's API, so your parsing code doesn't change.

Streaming responses Use streaming to receive Azure OpenAI responses token-by-token instead of waiting for the complete message.

Error handling Catch and handle Azure OpenAI API errors like authentication failures, rate limits, and model unavailability without crashing your application.

4 Deployments vs Models 7

What a deployment is A deployment is your named instance of an Azure OpenAI model that you pay for and call by name, not by model ID.

Creating deployments in Azure AI Foundry Deploy an OpenAI model to Azure AI Foundry so you can call it via the AzureOpenAI client.

Deployment name in code The <code>model</code> parameter in Azure OpenAI API calls must be your deployment name, not the model name (like gpt-4 or gpt-35-turbo).

Capacity: tokens per minute Azure OpenAI deployments have token-per-minute (TPM) rate limits that throttle requests when exceeded, and you must check your deployment's quota in the Azure portal before deploying to production.

Multiple deployments of same model Route API calls to different Azure OpenAI deployments of the same model to distribute load and enable gradual rollouts.

Deployment version management Control which model version your Azure OpenAI deployment uses by specifying the correct <code>api_version</code> parameter in your client initialization.

Model upgrade path Switch between deployed models in Azure OpenAI by changing the deployment name without rewriting your client code.

5 Azure-Specific Features 7

Content filtering: mandatory Azure OpenAI automatically screens both user input and model output for harmful content and returns filter results you must handle in production.

Custom content policies Configure content filters and safety policies for Azure OpenAI API calls to block or flag harmful requests before they reach the model.

Responsible AI controls Enable content filtering and safety checks on Azure OpenAI deployments to prevent harmful outputs and maintain compliance.

Private endpoint support Route Azure OpenAI API calls through a private VNet endpoint instead of the public internet to meet compliance and security requirements.

VNet Integration Route Azure OpenAI API calls through a Virtual Network to keep traffic private and meet enterprise security requirements.

Azure AD authentication Authenticate to Azure OpenAI using Azure Active Directory credentials instead of API keys.

Monitoring in Azure Portal Track your Azure OpenAI API usage, costs, and performance metrics directly from the Azure Portal dashboard without writing code.

6 Embeddings on Azure OpenAI 7

text-embedding-3-large deployment Generate vector embeddings from text using Azure OpenAI's text-embedding-3-large model to enable semantic search and similarity matching.

Embeddings call format Convert text into numerical vectors using Azure OpenAI's embeddings API to enable semantic search, clustering, and similarity comparisons.

Azure Embedding vs OpenAI Direct Azure OpenAI and OpenAI direct both create embeddings, but Azure routes through your organization's Azure tenant while OpenAI goes directly to OpenAI's endpoints.

Batch embedding patterns Generate vector embeddings for multiple texts in a single Azure OpenAI API call using the batch embeddings endpoint.

Dimension reduction support Azure OpenAI's embeddings API reduces high-dimensional text into fixed-size vectors for semantic search and clustering without explicit dimensionality reduction code.

Cost comparison Understand how Azure OpenAI pricing differs from standard OpenAI based on model, region, and usage tier.

Regional availability Azure OpenAI deployments are region-locked, so you must route requests to the endpoint matching your model deployment's region.

7 Common Errors and Fixes 7

Wrong api_version: 404 errors Azure OpenAI API requests fail with 404 when the api_version parameter doesn't match your deployment's supported versions.

Deployment name mismatch Azure OpenAI requires the deployment name (not the model name) in the model parameter of chat.completions.create().

Content filter rejection Azure OpenAI's content filter can reject requests or flag responses, and you need to handle the <code>content_filter_result</code> object to understand why.

Capacity exceeded: 429 A 429 HTTP status code means Azure OpenAI has temporarily exhausted capacity on your deployment and you must retry your request.

Endpoint URL format Azure OpenAI requires a region-specific endpoint URL that differs from the standard OpenAI API base URL.

Azure AD token errors Diagnose and fix Azure AD authentication failures when the AzureOpenAI client cannot validate your credentials.

Region model availability Check which AI models are deployed in your Azure region before making API calls to avoid 404 errors.

Intermediate

49 lessons

1 GPT-4.1 and Latest Models on Azure 7

GPT-4.1 deployment on Azure Deploy and call GPT-4.1 on Azure OpenAI Service using the AzureOpenAI client with your Azure credentials and deployment name.

GPT-4.1 mini for cost efficiency Use GPT-4.1 mini deployment in Azure OpenAI to reduce per-token costs by 95% while maintaining reasoning capability for most production workloads.

o1 and o3 on Azure: reasoning models Use OpenAI's reasoning models (o1, o3) through Azure OpenAI to solve complex problems that require step-by-step logical thinking before responding.

o3-mini deployment Deploy and query OpenAI's o3-mini reasoning model through Azure OpenAI with the AzureOpenAI client.

Model availability by region Query which language models are deployed in each Azure region and their deployment names to route requests correctly.

Model version pinning Pin specific Azure OpenAI model deployment versions in your API calls to prevent silent behavior changes when the service updates the underlying model.

Migration from GPT-4o to GPT-4.1 Switch your Azure OpenAI deployment from GPT-4o to GPT-4.1 by updating the model parameter and verifying compatibility with structured outputs and vision features.

2 Provisioned Throughput Units 7

What PTU provides: guaranteed capacity PTU (Provisioned Throughput Units) reserves fixed compute capacity on Azure OpenAI, guaranteeing token processing rate and stable latency regardless of demand spikes.

PTU sizing calculator Calculate Provisioned Throughput Units (PTUs) needed for your Azure OpenAI deployment based on expected token throughput and latency requirements.

PTU vs pay-as-you-go decision Choose between Provisioned Throughput Units (PTU) for predictable costs and high volume, or pay-as-you-go for variable workloads and testing.

PTU reservation and commitment Reserve Provisioned Throughput Units (PTUs) to lock in predictable pricing and avoid per-token overage costs when running high-volume Azure OpenAI workloads.

Monitoring PTU utilization Query Azure OpenAI's Provisioned Throughput Unit consumption to prevent throttling and right-size your deployment costs.

Overflow handling for PTU Handle token overflow gracefully when Provisioned Throughput Unit requests exceed allocated capacity by implementing retry logic and fallback strategies.

PTU cost analysis Calculate and compare per-request costs when using Provisioned Throughput Units (PTUs) versus pay-as-you-go pricing in Azure OpenAI.

3 Batch API on Azure 7

Batch API for 50% savings Use Azure OpenAI's Batch API to process non-urgent requests asynchronously and reduce costs by up to 50%.

Input file format for batch Azure OpenAI batch processing requires JSONL files with a specific message structure: one request per line, no array wrapper.

Job submission and monitoring Submit batch processing jobs to Azure OpenAI and poll their completion status without blocking your application.

Retrieving batch results Retrieve completed batch job results and error details from Azure OpenAI using the batch ID.

Supported models for batch Azure OpenAI batch processing only supports specific model deployments; understand which models qualify and why.

Use cases for batch processing Azure OpenAI batch processing lets you submit large request volumes asynchronously at lower cost, trading latency for 50% price reduction when time-sensitivity is low.

Cost comparison: batch vs realtime Azure OpenAI Batch API processes thousands of requests at 50% discount but with 24-hour latency, while real-time requests cost full price and execute instantly.

4 Azure AI Search Integration 7

Azure AI Search as vector store Use Azure AI Search to store and retrieve embeddings generated by Azure OpenAI, enabling semantic search across your documents.

Hybrid search: vector + keyword Combine Azure OpenAI embeddings with keyword search to retrieve documents using both semantic similarity and exact term matching.

Azure OpenAI + AI Search RAG pattern Retrieve grounded answers from your own data by chaining Azure OpenAI chat completions with Azure AI Search vector queries.

Semantic Ranking with Azure OpenAI Use Azure OpenAI embeddings to rank search results by semantic relevance rather than keyword matching.

Data ingestion pipeline Build a Python pipeline that reads documents from Azure Blob Storage, chunks them intelligently, and prepares them for embedding and retrieval.

On Your Data feature Use Azure OpenAI's On Your Data feature to ground LLM responses in your own documents without fine-tuning, via the data_sources parameter.

Production RAG on Azure Build retrieval-augmented generation on Azure OpenAI by combining embeddings API, vector search, and chat completions in a single production pipeline.

5 High Availability Architecture 7

Multi-region deployment Route API requests across multiple Azure OpenAI deployments in different regions to improve availability and reduce latency.

Azure Front Door for global routing Route Azure OpenAI requests through Azure Front Door to reduce latency, enable failover across regions, and apply global load balancing policies.

Failover configuration Route API requests across multiple Azure OpenAI deployments automatically when one fails using the AzureOpenAI client with fallback endpoints.

PTU cross-region failover Configure Azure OpenAI clients to automatically retry requests across regions when a PTU deployment becomes unavailable.

SLA guarantees Azure OpenAI provides tiered SLA commitments based on quota allocation, with uptime guarantees ranging from 99.9% for provisioned throughput to service-level credits for standard deployments.

Load balancer configuration Route API requests across multiple Azure OpenAI deployments using a load balancer pattern to distribute traffic and prevent single-endpoint bottlenecks.

Disaster recovery Implement multi-region failover and retry logic to keep your Azure OpenAI application running when a deployment or region becomes unavailable.

6 Authentication Deep Dive 7

API key vs Azure AD authentication Choose between shared API keys and federated Azure AD identities to authenticate with Azure OpenAI, trading simplicity for security and scalability.

Managed identity configuration Use Azure Managed Identity to authenticate your application to Azure OpenAI without storing API keys in code or environment variables.

Service principal setup Create and authenticate an Azure service principal to programmatically access Azure OpenAI without user credentials.

Key rotation strategy Implement graceful API key rotation in Azure OpenAI without dropping requests by maintaining dual keys and switching with zero downtime.

Network security configuration Configure Azure OpenAI clients to enforce TLS, disable SSL verification selectively, and use private endpoints for secure network isolation.

Private Endpoint Setup Configure Azure OpenAI to accept traffic only through a private endpoint, removing public internet access to your deployment.

Zero-trust network architecture Authenticate Azure OpenAI calls without storing credentials in code by using managed identities and environment-based configuration.

7 Cost and Monitoring 7

Azure Cost Management for OpenAI Track and optimize per-request costs for Azure OpenAI deployments using usage metrics and cost allocation tags.

Token usage per deployment Extract and track prompt and completion token counts from Azure OpenAI API responses to monitor per-deployment usage and costs.

Budget alerts configuration Set up spending thresholds and notifications in Azure OpenAI to prevent unexpected bills from runaway API calls.

Cost allocation tags Use the <code>headers</code> parameter in Azure OpenAI API calls to attach cost allocation tags for billing and chargeback tracking across teams and projects.

PTU vs pay-as-you-go comparison Understand when to commit to Provisioned Throughput Units (PTU) versus paying per token for Azure OpenAI deployments.

Optimizing deployment capacity Use deployment-level rate limits and quota management to prevent token throttling and ensure predictable API performance under production load.

ROI calculation framework Calculate the return on investment of Azure OpenAI API calls by tracking token usage, costs, and business outcomes in production applications.

Advanced

49 lessons

1 Enterprise Azure OpenAI Architecture 7

Azure AI Foundry for enterprise Use Azure AI Foundry to deploy, monitor, and govern LLM applications across your organization with built-in compliance, cost tracking, and multi-tenant isolation.

Hub and project model Azure OpenAI's hub-and-project isolation model partitions API deployments, quotas, and audit logs for multi-team or multi-environment control.

Multi-region deployment for HA Route Azure OpenAI API calls across multiple regions with automatic failover to maintain availability when one region degrades or throttles.

Private endpoint configuration Configure Azure OpenAI clients to route traffic through private endpoints instead of public internet endpoints to meet network isolation requirements.

Content filtering policy management Configure and enforce Azure OpenAI content filtering policies to block, flag, or allow specific content categories in chat completions.

Cross-account governance Enforce tenant isolation and role-based access control across multiple Azure subscriptions when deploying Azure OpenAI models.

Landing zone for Azure OpenAI Initialize and authenticate the AzureOpenAI client to establish a secure connection to your Azure OpenAI deployment.

2 Azure OpenAI Compliance 7

HIPAA BAA for Azure OpenAI Azure OpenAI supports HIPAA-covered entities through Business Associate Agreements, but you must explicitly enable compliance features and understand what Azure does and doesn't cover under the BAA.

Data residency configuration Control where your prompts and completions are processed and stored by specifying Azure region and API version in the AzureOpenAI client.

Zero data retention setup Configure Azure OpenAI to disable data retention and immediately delete conversation logs by setting data_in_at_rest_encryption_enabled and using the correct API version.

SOC2 and ISO compliance Azure OpenAI enforces SOC2 Type II and ISO 27001 compliance through audit logging, data residency controls, and encryption: configure your client to capture and retain logs for regulatory proof.

GDPR Compliance and Data Residency Configure Azure OpenAI deployments in GDPR-compliant regions and implement request logging patterns that satisfy EU data residency requirements without exposing sensitive user data.

Azure Policy for AI governance Enforce compliance rules and audit AI API usage across your Azure OpenAI deployments using Azure Policy definitions and assignments.

Enterprise compliance documentation Extract and structure compliance audit trails from Azure OpenAI API calls to satisfy regulatory requirements without manual log parsing.

3 LangChain and LlamaIndex on Azure 7

AzureChatOpenAI in LangChain Use LangChain's AzureChatOpenAI to integrate Azure OpenAI deployments with chains, agents, and RAG pipelines while managing authentication and token streaming at scale.

AzureOpenAIEmbeddings in LangChain Generate vector embeddings from text using Azure OpenAI's embedding models through LangChain's abstraction layer, enabling semantic search and retrieval augmented generation at scale.

Azure OpenAI in LlamaIndex Use Azure OpenAI as the LLM backbone in LlamaIndex RAG pipelines with explicit deployment configuration and managed indexing.

Azure AI Search in LangChain Integrate Azure AI Search as a retriever in LangChain LCEL to enable hybrid semantic and keyword search over your documents.

Building a RAG Pipeline with Azure OpenAI and Cognitive Search Combine Azure OpenAI's chat completions with Azure Cognitive Search to retrieve and augment responses with your own documents in a single production pipeline.

LangSmith with Azure OpenAI Instrument Azure OpenAI API calls with LangSmith to trace, debug, and monitor LLM behavior in production.

Framework vs native Azure SDK Choose between LangChain/LlamaIndex abstraction layers and direct AzureOpenAI SDK calls based on control needs, latency requirements, and cost visibility.

4 Content Safety and Responsible AI 7

Azure Content Safety service Use Azure Content Safety to analyze text and images for harmful content categories before processing through your LLM pipeline.

Content filter categories and severity Azure OpenAI content filters flag harmful content across categories (hate, sexual, violence, self-harm) with configurable severity thresholds in the response.

Custom content policies Apply custom content filtering rules to Azure OpenAI API calls by configuring filtering policies at the deployment and request level.

Groundedness detection Use Azure OpenAI with prompt engineering and external knowledge verification to detect whether model responses are grounded in factual sources or hallucinated.

Protected material detection Use Azure OpenAI's content filtering to detect and block requests containing protected material like violence, hate speech, and sexual content before processing.

Indirect prompt injection detection Detect when user input contains adversarial prompts designed to override system instructions by analyzing message patterns and content boundaries before sending to Azure OpenAI.

Responsible AI dashboard Monitor content filtering decisions, token usage, and model behavior through Azure OpenAI's content filter metrics and structured logging endpoints.

5 Advanced Azure OpenAI Patterns 7

Multi-model routing on Azure Route requests to different Azure OpenAI model deployments based on prompt characteristics, latency requirements, or cost targets using conditional logic and fallback patterns.

Azure Functions with OpenAI Deploy serverless Python functions on Azure that call Azure OpenAI endpoints with proper identity-based authentication and cold-start optimization.

Logic Apps integration Trigger Azure OpenAI completions from Logic Apps workflows using HTTP connectors and managed identity authentication.

Azure OpenAI for enterprise RAG Use Azure OpenAI's chat completions with vector search to build retrieval-augmented generation systems that scale across enterprise deployments.

Semantic Kernel with Azure OpenAI Use Microsoft's Semantic Kernel to compose orchestrated AI workflows that chain Azure OpenAI calls with memory, plugins, and planning without building custom orchestration logic.

Azure Bot Service integration with Azure OpenAI Route Azure Bot Service conversations through Azure OpenAI using the AzureOpenAI client to build stateful, context-aware conversational agents.

Event-driven Azure OpenAI pipelines Build scalable request-response pipelines using Azure Event Grid to queue, deduplicate, and route Azure OpenAI API calls with automatic retry and dead-letter handling.

6 Performance Optimization 7

Streaming for latency reduction Use Azure OpenAI streaming to receive chat completions incrementally, reducing perceived latency by up to 70% compared to waiting for the full response.

Prompt caching savings Use prompt caching to reduce token costs and latency by storing frequently repeated system prompts and context on Azure OpenAI's servers.

Connection pooling Reuse HTTP connections across multiple API calls to reduce latency and improve throughput when making repeated requests to Azure OpenAI.

Async SDK usage Use AsyncAzureOpenAI to make non-blocking API calls that scale to hundreds of concurrent requests without threading complexity.

Retry strategy configuration Configure exponential backoff and maximum retry attempts to handle transient failures and rate limits in Azure OpenAI API calls.

Timeout tuning Configure socket, request, and retry timeouts on AzureOpenAI client to prevent silent failures and handle long-running completions without killing valid requests.

Load Testing Azure OpenAI Systematically measure throughput, latency, and cost under concurrent load against Azure OpenAI deployments to validate capacity and identify bottlenecks before production traffic.

7 Operations and Governance 7

Azure Monitor integration Stream Azure OpenAI API metrics and errors to Azure Monitor for production observability and cost tracking.

Azure Log Analytics for OpenAI Stream Azure OpenAI API calls and token usage to Log Analytics workspace for production monitoring, cost tracking, and compliance auditing.

Diagnostic settings Enable Azure Monitor integration to capture request/response logs, latency metrics, and token usage for your Azure OpenAI deployments.

Operational runbook: Production deployment and incident response Build production-grade error handling, monitoring, and failover logic for Azure OpenAI deployments that stay online.

Incident Response for Azure OpenAI Detect, log, and recover from Azure OpenAI API failures with structured error handling, retry strategies, and circuit breaker patterns to maintain service reliability.

Change management for model upgrades Safely upgrade your Azure OpenAI model deployments without breaking production by validating compatibility, managing rollback states, and coordinating deployment names across your stack.

Multi-team governance model Implement role-based access control and cost allocation across teams using Azure OpenAI's managed identity and subscription-level RBAC to prevent credential sprawl and enforce spending guardrails.