Code Advanced hard · 8 min

Fallback chains: switching models on failure

What you will learn

Use LangChain's `with_fallbacks()` to automatically retry with alternative models or chains when the primary chain fails.

Why this matters

Production LLM pipelines fail: rate limits hit, API quotas exhaust, models go down. Without fallbacks, your entire application stops. With them, you gracefully degrade to slower models or different providers without user-facing outages.

Skip if: Don't use fallbacks when failures are unrecoverable (bad API key, malformed prompt, invalid input). Don't chain together models with identical failure modes (two OpenAI calls during an outage). Don't fall back to a strictly worse model if latency is the primary constraint: you've traded speed for reliability in the wrong direction.

Explanation

What it is: A LangChain pattern that chains multiple runnable objects together, where if the first fails, execution automatically attempts the next in sequence. This uses the with_fallbacks() method available on all LCEL runnables as of langchain-core 0.3.x. How it works: When you call chain.with_fallbacks([fallback_chain1, fallback_chain2]), LangChain wraps your chain in error-handling logic. If chain.invoke() raises an exception, it catches it, logs it, then tries fallback_chain1.invoke(). If that fails, it tries fallback_chain2.invoke(). The first successful result is returned; if all fail, the final exception is raised. When to use it: Use fallbacks when you have multiple viable execution paths (e.g., GPT-4 → GPT-3.5 → local open-source model), when some paths are more expensive but more reliable (premium API → standard API), or when you need geographic or provider redundancy. The key is that each fallback should be genuinely independent: different API key, different provider, different model class.

Analogy

Like having a primary supplier, a backup supplier, and a third-party vendor. If your primary supplier runs out of stock, you automatically try your backup. If they're also out, you go to the third-party, but slower. The customer gets their order either way.

Code

Illustrative only - not runnable without a valid API key

python

from langchain_openai import ChatOpenAI
from langchain_anthropic import ChatAnthropic
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
import os

os.environ['OPENAI_API_KEY'] = 'sk-test-key-invalid'
os.environ['ANTHROPIC_API_KEY'] = 'sk-ant-test-key-invalid'

prompt = ChatPromptTemplate.from_template(
    "Explain {topic} in one sentence."
)

output_parser = StrOutputParser()

primary_model = ChatOpenAI(
    model="gpt-4-turbo",
    temperature=0.7,
    max_retries=0,
    timeout=2
)

fallback_model_1 = ChatOpenAI(
    model="gpt-3.5-turbo",
    temperature=0.7,
    max_retries=0,
    timeout=2
)

fallback_model_2 = ChatAnthropic(
    model="claude-3-sonnet-20240229",
    temperature=0.7,
    timeout=2
)

primary_chain = prompt | primary_model | output_parser

fallback_chain_1 = prompt | fallback_model_1 | output_parser

fallback_chain_2 = prompt | fallback_model_2 | output_parser

resilient_chain = primary_chain.with_fallbacks(
    [fallback_chain_1, fallback_chain_2]
)

try:
    result = resilient_chain.invoke({"topic": "quantum computing"})
    print(result)
except Exception as e:
    print(f"All chains failed: {type(e).__name__}: {str(e)[:100]}")

Output

All chains failed: AuthenticationError: Incorrect API key provided. You can find your API key at https://platform.openai.com/account/api-keys.

What just happened?

The code created a primary chain (GPT-4) with two fallback chains (GPT-3.5 and Claude). When `invoke()` was called, the primary chain failed due to an invalid OpenAI API key. LangChain caught that exception, attempted the first fallback (GPT-3.5 with Anthropic), which failed due to an invalid Anthropic key. It then attempted the second fallback, which also failed. Since all chains exhausted, the final exception was raised and caught by the try-except block.

Common gotcha

Developers often assume fallbacks will silently activate if the model is slow or returns low-quality output. Fallbacks only trigger on exceptions (timeout, authentication, rate limit, network error): not on bad responses. If your primary model returns nonsense but doesn't error, fallback never runs. You need explicit validation logic before the chain if you want to trigger fallbacks on output quality. Also, fallbacks don't inherit the input/output schema from the primary chain: each chain must be independently valid for the same input signature, or you'll get silent schema mismatches.

Error recovery

AuthenticationError

Invalid API key for the model. Verify the environment variable name and value. Check that different fallback chains use different API key variables to avoid cascading auth failures.

RateLimitError

Model hit API rate limit. This is a legitimate failure mode where fallbacks shine: just ensure fallback models have separate rate limit pools (different API key or different provider entirely).

TimeoutError

Request exceeded the `timeout` parameter. Increase timeout on fallback chains, or use a fallback with lower latency (e.g., smaller model or local inference).

InvalidRequestError

The prompt or parameters are malformed. This won't be fixed by fallbacks: the error occurs before model invocation. Add input validation before the chain.

RunnableConfigError

Fallback chain has incompatible input/output schema. Ensure all chains accept the same input keys and return the same output type.

Experienced dev note

In production, don't just chain fallbacks blindly: instrument each fallback with a counter or log warning when it activates. You need visibility into *why* your primary chain failed and *how often*. This is your canary for problems: if you're hitting fallbacks constantly, your primary model is degraded or your traffic spiked. Silent fallback success is a debt; you're paying with latency/cost and not knowing it. Also, test your fallbacks explicitly: many teams add fallback chains in development and never trigger them, so they fail catastrophically in production when needed most.

Check your understanding

If your primary OpenAI chain has a 2-second timeout and your first fallback (also OpenAI) has a 5-second timeout, why might the fallback still not help during a genuine outage on OpenAI's infrastructure? What would you need to change to fix this?

Show answer hint

A correct answer recognizes that both chains depend on the same provider (OpenAI), so an infrastructure outage at OpenAI affects both. The longer timeout on the fallback doesn't matter if the service is down. The fix requires a fallback using a *different provider* entirely (e.g., Anthropic, local model), not just a different model parameter.

VERSION The `with_fallbacks()` method is stable as of langchain-core 0.3.x (current as of April 2026). In langchain < 0.3.0, fallbacks required manual error handling or deprecated chain composition patterns. The LCEL pipe syntax with fallbacks is the modern standard.

Learn <strong>streaming with fallbacks</strong>: how to handle partial token output when a streaming chain fails mid-generation, and which fallback chain state you need to maintain across stream checkpoints.

Community Notes

No notes yetBe the first to share a version-specific fix or tip.