Fallback chains: switching models on failure
Why this matters
Production LLM pipelines fail: rate limits hit, API quotas exhaust, models go down. Without fallbacks, your entire application stops. With them, you gracefully degrade to slower models or different providers without user-facing outages.
Explanation
What it is: A LangChain pattern that chains multiple runnable objects together, where if the first fails, execution automatically attempts the next in sequence. This uses the with_fallbacks() method available on all LCEL runnables as of langchain-core 0.3.x. How it works: When you call chain.with_fallbacks([fallback_chain1, fallback_chain2]), LangChain wraps your chain in error-handling logic. If chain.invoke() raises an exception, it catches it, logs it, then tries fallback_chain1.invoke(). If that fails, it tries fallback_chain2.invoke(). The first successful result is returned; if all fail, the final exception is raised. When to use it: Use fallbacks when you have multiple viable execution paths (e.g., GPT-4 → GPT-3.5 → local open-source model), when some paths are more expensive but more reliable (premium API → standard API), or when you need geographic or provider redundancy. The key is that each fallback should be genuinely independent: different API key, different provider, different model class.
Analogy
Like having a primary supplier, a backup supplier, and a third-party vendor. If your primary supplier runs out of stock, you automatically try your backup. If they're also out, you go to the third-party, but slower. The customer gets their order either way.
Code
from langchain_openai import ChatOpenAI
from langchain_anthropic import ChatAnthropic
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
import os
os.environ['OPENAI_API_KEY'] = 'sk-test-key-invalid'
os.environ['ANTHROPIC_API_KEY'] = 'sk-ant-test-key-invalid'
prompt = ChatPromptTemplate.from_template(
"Explain {topic} in one sentence."
)
output_parser = StrOutputParser()
primary_model = ChatOpenAI(
model="gpt-4-turbo",
temperature=0.7,
max_retries=0,
timeout=2
)
fallback_model_1 = ChatOpenAI(
model="gpt-3.5-turbo",
temperature=0.7,
max_retries=0,
timeout=2
)
fallback_model_2 = ChatAnthropic(
model="claude-3-sonnet-20240229",
temperature=0.7,
timeout=2
)
primary_chain = prompt | primary_model | output_parser
fallback_chain_1 = prompt | fallback_model_1 | output_parser
fallback_chain_2 = prompt | fallback_model_2 | output_parser
resilient_chain = primary_chain.with_fallbacks(
[fallback_chain_1, fallback_chain_2]
)
try:
result = resilient_chain.invoke({"topic": "quantum computing"})
print(result)
except Exception as e:
print(f"All chains failed: {type(e).__name__}: {str(e)[:100]}") All chains failed: AuthenticationError: Incorrect API key provided. You can find your API key at https://platform.openai.com/account/api-keys.
What just happened?
The code created a primary chain (GPT-4) with two fallback chains (GPT-3.5 and Claude). When `invoke()` was called, the primary chain failed due to an invalid OpenAI API key. LangChain caught that exception, attempted the first fallback (GPT-3.5 with Anthropic), which failed due to an invalid Anthropic key. It then attempted the second fallback, which also failed. Since all chains exhausted, the final exception was raised and caught by the try-except block.
Common gotcha
Developers often assume fallbacks will silently activate if the model is slow or returns low-quality output. Fallbacks only trigger on exceptions (timeout, authentication, rate limit, network error): not on bad responses. If your primary model returns nonsense but doesn't error, fallback never runs. You need explicit validation logic before the chain if you want to trigger fallbacks on output quality. Also, fallbacks don't inherit the input/output schema from the primary chain: each chain must be independently valid for the same input signature, or you'll get silent schema mismatches.
Error recovery
AuthenticationErrorRateLimitErrorTimeoutErrorInvalidRequestErrorRunnableConfigErrorExperienced dev note
In production, don't just chain fallbacks blindly: instrument each fallback with a counter or log warning when it activates. You need visibility into *why* your primary chain failed and *how often*. This is your canary for problems: if you're hitting fallbacks constantly, your primary model is degraded or your traffic spiked. Silent fallback success is a debt; you're paying with latency/cost and not knowing it. Also, test your fallbacks explicitly: many teams add fallback chains in development and never trigger them, so they fail catastrophically in production when needed most.
Check your understanding
If your primary OpenAI chain has a 2-second timeout and your first fallback (also OpenAI) has a 5-second timeout, why might the fallback still not help during a genuine outage on OpenAI's infrastructure? What would you need to change to fix this?
Show answer hint
A correct answer recognizes that both chains depend on the same provider (OpenAI), so an infrastructure outage at OpenAI affects both. The longer timeout on the fallback doesn't matter if the service is down. The fix requires a fallback using a *different provider* entirely (e.g., Anthropic, local model), not just a different model parameter.