Cheat Sheet intermediate · 8 min read

Azure OpenAI Cheat Sheet — API & Models Reference — Azure Op

version 2024-12

Deploy OpenAI models on Azure infrastructure

AZURE_OPENAI_API_KEYAZURE_OPENAI_ENDPOINTAZURE_OPENAI_API_VERSION
install pip install azure-openai
core imports
python
from azure.openai import AzureOpenAI
from openai import AzureOpenAI
import os
Mental model

OpenAI API wrapper that routes requests to Azure infrastructure instead of OpenAI directly.

Like renting an apartment (OpenAI) vs. buying a house in your neighborhood (Azure). Same appliances, your infrastructure, your compliance controls.

Common Patterns

01 Basic Authentication & Chat
Simple chat completions with Azure deployment
python
from azure.openai import AzureOpenAI
import os

client = AzureOpenAI(
    api_key=os.environ["AZURE_OPENAI_API_KEY"],
    api_version="2024-10-01-preview",
    azure_endpoint=os.environ["AZURE_OPENAI_ENDPOINT"]
)

response = client.chat.completions.create(
    model="gpt-4o",  # deployment name, not model name
    messages=[
        {"role": "system", "content": "You are helpful."},
        {"role": "user", "content": "Explain RAG."}
    ]
)

print(response.choices[0].message.content)
output Retrieval-Augmented Generation (RAG) combines...
Model parameter must match your Azure deployment name, NOT the actual model name. If you deployed 'gpt-4o' as 'my-gpt4o-deployment', use model='my-gpt4o-deployment'.
02 Streaming Responses
Real-time token streaming for chat
python
response = client.chat.completions.create(
    model="gpt-4o-deployment",
    messages=[{"role": "user", "content": "Write a haiku"}],
    stream=True,
    temperature=0.7
)

for chunk in response:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)
output Golden leaves fall fast Autumn whispers in the wind Winter sleeps below
Streaming responses don't include usage tokens. If you need token counts, make two calls: one streamed, one non-streamed with include_usage=True.
03 Text Embeddings
Generate embeddings for RAG/search
python
response = client.embeddings.create(
    model="text-embedding-3-small",  # your deployment name
    input="The quick brown fox"
)

embedding = response.data[0].embedding
print(f"Embedding dimensions: {len(embedding)}")
print(f"First 5 values: {embedding[:5]}")
output Embedding dimensions: 512 First 5 values: [0.0234, -0.156, 0.899, ...]
Text-embedding-3-small and text-embedding-3-large have different output dimensions (512 vs 3072). Mixing them in a database will break similarity search.
04 Managed Identity (No API Key)
Authenticate via Azure AD without hardcoding keys
python
from azure.identity import DefaultAzureCredential
from azure.openai import AzureOpenAI

credential = DefaultAzureCredential()
client = AzureOpenAI(
    api_version="2024-10-01-preview",
    azure_endpoint=os.environ["AZURE_OPENAI_ENDPOINT"],
    azure_ad_token_provider=credential.get_token
)

response = client.chat.completions.create(
    model="gpt-4o-deployment",
    messages=[{"role": "user", "content": "Hello"}]
)
output Authentication successful via Azure AD.
DefaultAzureCredential tries multiple auth methods in order. In local dev, it may pick the wrong credential (e.g., CLI instead of env vars). Explicitly specify the credential type for predictability.
05 Vision (GPT-4 Vision Deployment)
Analyze images with Azure OpenAI
python
import base64
import httpx

image_url = "https://example.com/image.jpg"
image_data = base64.standard_b64encode(httpx.get(image_url).content).decode("utf-8")

response = client.chat.completions.create(
    model="gpt-4-vision-deployment",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "What's in this image?"},
                {
                    "type": "image_url",
                    "image_url": {
                        "url": f"data:image/jpeg;base64,{image_data}"
                    }
                }
            ]
        }
    ]
)

print(response.choices[0].message.content)
output The image shows a sunset over mountains with...
Vision requires specific deployment regions on Azure. Not all regions support gpt-4-vision. Check Azure regional availability before deploying.

Chat Completions Parameters

client.chat.completions.create()

ParameterTypeDefaultNotes
model str required Azure deployment name (NOT 'gpt-4o'). Exact deployment name from Azure portal.
messages list[dict] required Array of role/content dicts. Roles: 'system', 'user', 'assistant', 'function'.
temperature float 1.0 0.0-2.0. Lower = deterministic, higher = creative. 0.0 = always same output.
top_p float 1.0 0.0-1.0. Nucleus sampling. Lower = narrower token choices.
max_tokens int none Max output tokens. If unset, uses model context limit minus input tokens.
stream bool False True = streaming tokens. False = full response at once.
frequency_penalty float 0.0 -2.0 to 2.0. Higher = discourages repeating tokens.
presence_penalty float 0.0 -2.0 to 2.0. Higher = discourages new topics.

Core API Methods

Method / Property Description Returns
client.chat.completions.create() Generate chat completions. Supports streaming, vision, function calling. ChatCompletion or Iterator[ChatCompletionChunk] if stream=True
client.embeddings.create() Generate text embeddings for semantic search and RAG. CreateEmbeddingResponse with data[0].embedding (list of floats)
client.completions.create() Legacy text completion (not recommended). Use chat.completions instead. Completion
client.images.generate() Generate images from text. Requires dall-e-3 deployment. ImagesResponse with data[0].url

Common Errors & Fixes

01 AuthenticationError: Invalid credentials

Cause: Missing or invalid AZURE_OPENAI_API_KEY or AZURE_OPENAI_ENDPOINT.

Fix:
python
Verify env vars are set:

import os
print(os.environ.get('AZURE_OPENAI_API_KEY'))
print(os.environ.get('AZURE_OPENAI_ENDPOINT'))

Or explicitly pass in:

client = AzureOpenAI(
    api_key="your-key",
    api_version="2024-10-01-preview",
    azure_endpoint="https://your-resource.openai.azure.com/"
)
02 NotFoundError: Model 'gpt-4o' not found

Cause: Using model name instead of deployment name. Or deployment doesn't exist.

Fix:
python
Use your Azure deployment name:

# WRONG: model='gpt-4o'
# RIGHT:
client.chat.completions.create(
    model="my-gpt4o-deployment",  # Name from Azure portal
    messages=[...]
)

To find deployments:
from azure.mgmt.cognitiveservices import CognitiveServicesManagementClient
03 RateLimitError: 429 Quota exceeded

Cause: Exceeded deployment token-per-minute (TPM) quota or request limits.

Fix:
python
Implement exponential backoff:

import time

for attempt in range(5):
    try:
        response = client.chat.completions.create(...)
        break
    except RateLimitError:
        wait = 2 ** attempt
        print(f'Rate limited. Waiting {wait}s')
        time.sleep(wait)

Or increase deployment quota in Azure portal > Quotas > Increase.
04 InvalidRequestError: context_length_exceeded

Cause: Input + output tokens exceed model's context window.

Fix:
python
Check token count before sending:

import tiktoken

enc = tiktoken.get_encoding('cl100k_base')
tokens = enc.encode(messages_text)
print(f'Token count: {len(tokens)}')

Or use max_tokens to limit output:

response = client.chat.completions.create(
    model='gpt-4o-deployment',
    messages=[...],
    max_tokens=500  # Limit output
)

Azure OpenAI vs OpenAI API

FeatureAzure OpenAIOpenAI API
Cost ModelPay-per-token, regional pricingPay-per-token, global pricing
ComplianceFedRAMP, SOC 2, ISO, HIPAA-eligibleStandard SLAs, no FedRAMP
VPC/NetworkVNet integration, private endpointsPublic API only
Quota ControlPer-deployment TPM limits in AzureAccount-level rate limits
Model SelectionDeploy specific model versionsAlways latest stable version
AuthenticationAPI key or Azure AD managed identityAPI key only
Data ResidencyData stays in Azure regionData stored in OpenAI US infrastructure

Production Gotchas

API Version Mismatch Breaks Features

Azure rotates API versions quarterly. Function calling, vision, and tool use require specific api_version strings. Using api_version='2024-02-15-preview' on a feature that requires '2024-10-01-preview' will silently fail or return errors. Always pin to the exact version your feature needs, and test after Azure updates.

Deployment Name ≠ Model Name

This is the #1 confusion. Your Azure deployment might be named 'gpt-4o-prod' but deploy the 'gpt-4o' model. When calling create(), use model='gpt-4o-prod' (deployment name), not model='gpt-4o' (model name). Swapping these causes 404 NotFoundError.

Token Limits Vary by Region

TPM (tokens-per-minute) quotas are region-specific. East US may have 40K TPM while Central US has 10K. Deploying to a quota-limited region under production load causes 429 RateLimitError. Monitor quota utilization and scale regions accordingly.

Streaming Doesn't Include Usage Stats

When stream=True, the final chunk won't include usage (prompt_tokens, completion_tokens). If you need token counts, either disable streaming or make a separate non-streamed call. This matters for cost tracking.

Vision Deployments Require Specific Regions

gpt-4-vision is only available in select regions (East US, West Europe, etc.). Deploying in an unsupported region fails silently. Verify regional availability in Azure docs before deploying vision workloads.

Verified 2026-04 · v2024-12-01 · gpt-4o, gpt-4.1, text-embedding-3-small, text-embedding-3-large
Verify ↗

Community Notes

No notes yetBe the first to share a version-specific fix or tip.