How to implement circuit breaker for AI services
Quick answer
Implement a
circuit breaker by monitoring AI service call failures and temporarily halting requests when errors exceed a threshold, then retrying after a cooldown period. This prevents cascading failures and improves system resilience when using OpenAI or other AI APIs.PREREQUISITES
Python 3.8+OpenAI API key (free tier works)pip install openai>=1.0
Setup
Install the openai Python package and set your API key as an environment variable for secure access.
pip install openai Step by step
This example demonstrates a simple circuit breaker for AI service calls using OpenAI SDK. It tracks consecutive failures, opens the circuit to block calls after a threshold, and resets after a cooldown.
import os
import time
from openai import OpenAI
class CircuitBreaker:
def __init__(self, failure_threshold=3, recovery_time=10):
self.failure_threshold = failure_threshold
self.recovery_time = recovery_time # seconds
self.failure_count = 0
self.last_failure_time = None
self.state = "CLOSED" # CLOSED, OPEN, HALF_OPEN
def call(self, func, *args, **kwargs):
if self.state == "OPEN":
elapsed = time.time() - self.last_failure_time
if elapsed > self.recovery_time:
self.state = "HALF_OPEN"
else:
raise Exception("Circuit breaker is OPEN: calls are blocked")
try:
result = func(*args, **kwargs)
except Exception as e:
self.failure_count += 1
self.last_failure_time = time.time()
if self.failure_count >= self.failure_threshold:
self.state = "OPEN"
raise e
else:
if self.state == "HALF_OPEN":
self.state = "CLOSED"
self.failure_count = 0
return result
# Initialize OpenAI client
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
# Define a function to call the AI service
def call_ai_service(prompt):
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": prompt}]
)
return response.choices[0].message.content
# Instantiate circuit breaker
breaker = CircuitBreaker(failure_threshold=3, recovery_time=15)
# Example usage
prompts = ["Hello AI!", "Cause error", "Hello again!"]
for prompt in prompts:
try:
# Simulate error for a specific prompt
if prompt == "Cause error":
raise Exception("Simulated AI service failure")
result = breaker.call(call_ai_service, prompt)
print(f"AI response: {result}")
except Exception as e:
print(f"Call failed: {e}")
time.sleep(2) output
AI response: <AI-generated text> Call failed: Simulated AI service failure Call failed: Simulated AI service failure Call failed: Circuit breaker is OPEN: calls are blocked
Common variations
You can implement async circuit breakers using asyncio for non-blocking AI calls, or integrate with other SDKs like Anthropic or Google Gemini. Adjust thresholds and cooldowns based on your SLA and error patterns.
import asyncio
class AsyncCircuitBreaker:
def __init__(self, failure_threshold=3, recovery_time=10):
self.failure_threshold = failure_threshold
self.recovery_time = recovery_time
self.failure_count = 0
self.last_failure_time = None
self.state = "CLOSED"
async def call(self, func, *args, **kwargs):
if self.state == "OPEN":
elapsed = asyncio.get_event_loop().time() - self.last_failure_time
if elapsed > self.recovery_time:
self.state = "HALF_OPEN"
else:
raise Exception("Circuit breaker is OPEN: calls are blocked")
try:
result = await func(*args, **kwargs)
except Exception as e:
self.failure_count += 1
self.last_failure_time = asyncio.get_event_loop().time()
if self.failure_count >= self.failure_threshold:
self.state = "OPEN"
raise e
else:
if self.state == "HALF_OPEN":
self.state = "CLOSED"
self.failure_count = 0
return result
# Usage with async AI call function
# async def async_call_ai_service(prompt):
# ...
# breaker = AsyncCircuitBreaker() Troubleshooting
- If you see frequent circuit breaker openings, check your AI service quota and network stability.
- Ensure your error handling distinguishes between transient and permanent errors to avoid unnecessary circuit trips.
- Adjust
failure_thresholdandrecovery_timeto balance availability and fault tolerance.
Key Takeaways
- Use a circuit breaker to prevent cascading failures when AI services are unstable.
- Track consecutive failures and open the circuit to block calls temporarily.
- Reset the circuit after a cooldown period to retry AI service calls safely.
- Customize thresholds and recovery times based on your application's tolerance for failure.
- Implement async circuit breakers for non-blocking AI API calls.