Code Advanced hard · 8 min

Timeout handling: preventing hanging requests

What you will learn

Use request timeouts and async patterns to prevent LangChain chains from blocking indefinitely on slow or unresponsive LLM APIs.

Why this matters

In production, a single hanging request to an LLM API can exhaust thread pools, trigger cascading timeouts, and degrade service availability. Without proper timeout handling, your application becomes vulnerable to slow API responses, network issues, or rate-limit backoff delays that silently freeze entire request pipelines.

Skip if: You should not force a timeout if you are processing intentionally long-running tasks like document processing or fine-tuning jobs that legitimately require multiple minutes. Also, do not use client-side timeouts as a substitute for proper retry logic: timeouts interrupt work; retries recover from transient failures.

Explanation

What it is: Timeout handling in LangChain means setting explicit maximum durations for LLM API calls and chain execution. If a request exceeds that duration, it is forcibly cancelled and raises an exception, preventing the request from hanging indefinitely.

How it works mechanically: LangChain's underlying HTTP clients (via langchain-openai, langchain-anthropic, etc.) accept timeout parameters that propagate to the Python httpx or requests library. When you invoke a chain with .invoke(), if the LLM does not respond within the timeout window, the client raises httpx.TimeoutException or similar. Async chains use asyncio.wait_for() to enforce timeout at the chain level, separate from HTTP-level timeouts.

When to use it: Always set timeouts on production chains. Use shorter timeouts (5–15 seconds) for user-facing endpoints, longer ones (30–60 seconds) for background batch processing. Combine timeouts with exponential backoff retry logic so transient failures recover gracefully.

Analogy

A timeout is like setting an alarm when you call customer service: if no one answers within 30 seconds, you hang up and try again, rather than holding forever. Without the alarm, you wait indefinitely and miss other calls.

Code

python

import httpx
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from typing import Optional
import asyncio

template = "Answer this in one sentence: {question}"
prompt = ChatPromptTemplate.from_template(template)

llm_with_timeout = ChatOpenAI(
    model="gpt-4o-mini",
    timeout=5.0,
    max_retries=2
)

chain = prompt | llm_with_timeout | StrOutputParser()

try:
    result = chain.invoke({"question": "What is 2+2?"})
    print(f"Success: {result}")
except httpx.TimeoutException:
    print("Request timed out after 5 seconds")
except Exception as e:
    print(f"Error: {type(e).__name__}: {e}")

async def async_call_with_chain_timeout(question: str, timeout_seconds: float = 10.0) -> Optional[str]:
    try:
        result = await asyncio.wait_for(
            chain.ainvoke({"question": question}),
            timeout=timeout_seconds
        )
        return result
    except asyncio.TimeoutError:
        print(f"Chain timed out after {timeout_seconds} seconds")
        return None
    except Exception as e:
        print(f"Error: {type(e).__name__}: {e}")
        return None

result = asyncio.run(async_call_with_chain_timeout("What is the capital of France?", timeout_seconds=10.0))
if result:
    print(f"Async result: {result}")

Output

Success: 2 + 2 = 4
Async result: The capital of France is Paris.

What just happened?

The code created a ChatOpenAI LLM instance with a 5-second HTTP timeout and max 2 retries. The synchronous chain executed successfully and returned an answer. Then an async wrapper used <code>asyncio.wait_for()</code> to enforce a 10-second timeout on the entire chain invocation. Both requests completed within their timeout windows, so both succeeded and printed results. If either had exceeded its timeout, the respective exception handler would have caught it.

Common gotcha

Many developers set timeout on the LLM object but forget that timeout in LangChain LLM constructors refers only to the HTTP socket timeout: the time waiting for a socket read. If the LLM is hung in backoff or queuing, the HTTP socket may not even be open yet, so the LLM-level timeout won't fire. For true end-to-end timeout protection, always wrap chain invocation with asyncio.wait_for() on async chains or use a separate process-level timeout (e.g., with Celery). Also, retries happen *within* the timeout window: if a request times out and retry is enabled, the retry clock does not reset; it shares the same budget.

Error recovery

httpx.TimeoutException

Raised when the HTTP request to the LLM API exceeds the socket timeout. Fix: increase <code>timeout</code> parameter if legitimate slowness is expected, or add exponential backoff retry logic with <code>max_retries</code> to tolerate transient network delays.

asyncio.TimeoutError

Raised when <code>asyncio.wait_for()</code> deadline is exceeded. Fix: increase the <code>timeout</code> parameter in <code>asyncio.wait_for()</code> or restructure the chain to parallelize independent calls using <code>asyncio.gather()</code> instead of sequential invocation.

ConnectionError during timeout

Network drops while chain is executing. Fix: implement retry logic with <code>Tenacity</code> or LangChain's built-in <code>max_retries</code>. Ensure timeout is long enough to allow at least one full retry cycle.

Experienced dev note

In production, timeout + max_retries alone is not enough. A request can be retried three times at 5 seconds each, consuming 15 seconds of wall-clock time, while a subsequent request waits in queue. Instead, combine timeout with a circuit breaker pattern (use pybreaker library) so that if the LLM is genuinely down, you fail fast after the first or second retry instead of burning through the entire retry budget. Also, measure actual p99 latencies from your LLM provider (not just p50) and set timeout to p99 + 20% buffer; do not guess. If you use LangSmith tracing, timeouts will show up as COMPLETED_WITH_ERROR, not CANCELLED, which surprises developers.

Check your understanding

If a chain has timeout=5.0 on the LLM and you wrap it with asyncio.wait_for(chain.ainvoke(...), timeout=10.0), which timeout takes precedence if the LLM API never responds, and why?

Show answer hint

The HTTP timeout (5.0 seconds) fires first because it is the innermost layer: the socket read will fail after 5 seconds, raising an exception that propagates up. The chain-level timeout (10.0) is never reached in this case. Both timeouts are useful: HTTP timeout protects individual requests; chain timeout protects entire compound operations that may make multiple calls.

VERSION In langchain-core < 0.2.0, Chain.ainvoke() did not exist; use Chain.arun() instead (deprecated in 0.3.x). Also, timeout parameter names changed in langchain-openai >= 1.0.0: older versions used request_timeout; verify your installed version with pip show langchain-openai.

Implement exponential backoff retry logic with Tenacity to complement timeouts, ensuring transient failures are recovered without exceeding timeout boundaries.

Community Notes

No notes yetBe the first to share a version-specific fix or tip.