Azure OpenAI enterprise cost optimization
gpt-4o-mini for less intensive tasks, and leveraging token limits and prompt engineering to reduce token consumption. Use batching and caching to minimize redundant calls and apply Azure's budgeting and alerting features to control spending.PREREQUISITES
Python 3.8+Azure subscription with Azure OpenAI accessAzure CLI installed and configuredpip install azure-identity azure-mgmt-costmanagement openai>=1.0
Setup Azure environment
Install Azure SDKs and configure authentication to access Azure Cost Management APIs and Azure OpenAI service. Set environment variables for secure API access.
pip install azure-identity azure-mgmt-costmanagement openai Collecting azure-identity Collecting azure-mgmt-costmanagement Collecting openai Successfully installed azure-identity azure-mgmt-costmanagement openai
Step by step cost monitoring and optimization
Use Azure Cost Management SDK to track your Azure OpenAI spending programmatically. Choose cost-effective models and optimize token usage by prompt design and batching requests.
import os
from azure.identity import DefaultAzureCredential
from azure.mgmt.costmanagement import CostManagementClient
from openai import OpenAI
# Authenticate with Azure
credential = DefaultAzureCredential()
cost_client = CostManagementClient(credential, subscription_id=os.environ['AZURE_SUBSCRIPTION_ID'])
# Query cost for Azure OpenAI resource
scope = f"/subscriptions/{os.environ['AZURE_SUBSCRIPTION_ID']}"
cost_query = {
"type": "ActualCost",
"timeframe": "MonthToDate",
"dataset": {
"granularity": "Daily",
"filter": {
"dimensions": {
"name": "ResourceType",
"operator": "In",
"values": ["Microsoft.CognitiveServices/accounts"]
}
},
"aggregation": {
"totalCost": {
"name": "PreTaxCost",
"function": "Sum"
}
}
}
}
cost_result = cost_client.query.usage(scope=scope, parameters=cost_query)
print(f"Azure OpenAI cost month-to-date: ${cost_result.properties.rows[0][0]:.2f}")
# Use OpenAI SDK with cost-efficient model
client = OpenAI(api_key=os.environ['AZURE_OPENAI_API_KEY'])
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Summarize enterprise cost optimization."}],
max_tokens=200
)
print(response.choices[0].message.content) Azure OpenAI cost month-to-date: $123.45 Enterprise cost optimization involves selecting efficient models, monitoring usage, and reducing token consumption.
Common variations
Use async calls with asyncio for concurrent requests, stream responses to reduce latency, or switch models like gpt-4o for higher quality at higher cost. Adjust max_tokens and prompt length to control token usage.
import os
import asyncio
from openai import OpenAI
async def async_chat():
client = OpenAI(api_key=os.environ['AZURE_OPENAI_API_KEY'])
response = await client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Explain cost optimization."}],
max_tokens=150,
stream=True
)
async for chunk in response:
print(chunk.choices[0].delta.content or '', end='', flush=True)
asyncio.run(async_chat()) Cost optimization includes monitoring usage, selecting efficient models, and prompt engineering to reduce token consumption.
Troubleshooting cost spikes
If you notice unexpected cost spikes, verify your usage logs in Azure Portal and check for runaway or redundant API calls. Implement rate limiting and caching to reduce repeated requests. Use Azure budgets and alerts to get notified before costs escalate.
Key Takeaways
- Monitor Azure OpenAI usage with Azure Cost Management APIs to track spending in real time.
- Select smaller models like
gpt-4o-minifor routine tasks to reduce token costs. - Optimize prompts and batch requests to minimize token consumption and API calls.
- Use Azure budgets and alerts to proactively control enterprise spending.
- Implement caching and rate limiting to avoid redundant calls and unexpected cost spikes.