Timeout handling: preventing hanging requests
Why this matters
In production, a single hanging request to an LLM API can exhaust thread pools, trigger cascading timeouts, and degrade service availability. Without proper timeout handling, your application becomes vulnerable to slow API responses, network issues, or rate-limit backoff delays that silently freeze entire request pipelines.
Explanation
What it is: Timeout handling in LangChain means setting explicit maximum durations for LLM API calls and chain execution. If a request exceeds that duration, it is forcibly cancelled and raises an exception, preventing the request from hanging indefinitely.
How it works mechanically: LangChain's underlying HTTP clients (via langchain-openai, langchain-anthropic, etc.) accept timeout parameters that propagate to the Python httpx or requests library. When you invoke a chain with .invoke(), if the LLM does not respond within the timeout window, the client raises httpx.TimeoutException or similar. Async chains use asyncio.wait_for() to enforce timeout at the chain level, separate from HTTP-level timeouts.
When to use it: Always set timeouts on production chains. Use shorter timeouts (5–15 seconds) for user-facing endpoints, longer ones (30–60 seconds) for background batch processing. Combine timeouts with exponential backoff retry logic so transient failures recover gracefully.
Analogy
A timeout is like setting an alarm when you call customer service: if no one answers within 30 seconds, you hang up and try again, rather than holding forever. Without the alarm, you wait indefinitely and miss other calls.
Code
import httpx
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from typing import Optional
import asyncio
template = "Answer this in one sentence: {question}"
prompt = ChatPromptTemplate.from_template(template)
llm_with_timeout = ChatOpenAI(
model="gpt-4o-mini",
timeout=5.0,
max_retries=2
)
chain = prompt | llm_with_timeout | StrOutputParser()
try:
result = chain.invoke({"question": "What is 2+2?"})
print(f"Success: {result}")
except httpx.TimeoutException:
print("Request timed out after 5 seconds")
except Exception as e:
print(f"Error: {type(e).__name__}: {e}")
async def async_call_with_chain_timeout(question: str, timeout_seconds: float = 10.0) -> Optional[str]:
try:
result = await asyncio.wait_for(
chain.ainvoke({"question": question}),
timeout=timeout_seconds
)
return result
except asyncio.TimeoutError:
print(f"Chain timed out after {timeout_seconds} seconds")
return None
except Exception as e:
print(f"Error: {type(e).__name__}: {e}")
return None
result = asyncio.run(async_call_with_chain_timeout("What is the capital of France?", timeout_seconds=10.0))
if result:
print(f"Async result: {result}") Success: 2 + 2 = 4 Async result: The capital of France is Paris.
What just happened?
The code created a ChatOpenAI LLM instance with a 5-second HTTP timeout and max 2 retries. The synchronous chain executed successfully and returned an answer. Then an async wrapper used <code>asyncio.wait_for()</code> to enforce a 10-second timeout on the entire chain invocation. Both requests completed within their timeout windows, so both succeeded and printed results. If either had exceeded its timeout, the respective exception handler would have caught it.
Common gotcha
Many developers set timeout on the LLM object but forget that timeout in LangChain LLM constructors refers only to the HTTP socket timeout: the time waiting for a socket read. If the LLM is hung in backoff or queuing, the HTTP socket may not even be open yet, so the LLM-level timeout won't fire. For true end-to-end timeout protection, always wrap chain invocation with asyncio.wait_for() on async chains or use a separate process-level timeout (e.g., with Celery). Also, retries happen *within* the timeout window: if a request times out and retry is enabled, the retry clock does not reset; it shares the same budget.
Error recovery
httpx.TimeoutExceptionasyncio.TimeoutErrorConnectionError during timeoutExperienced dev note
In production, timeout + max_retries alone is not enough. A request can be retried three times at 5 seconds each, consuming 15 seconds of wall-clock time, while a subsequent request waits in queue. Instead, combine timeout with a circuit breaker pattern (use pybreaker library) so that if the LLM is genuinely down, you fail fast after the first or second retry instead of burning through the entire retry budget. Also, measure actual p99 latencies from your LLM provider (not just p50) and set timeout to p99 + 20% buffer; do not guess. If you use LangSmith tracing, timeouts will show up as COMPLETED_WITH_ERROR, not CANCELLED, which surprises developers.
Check your understanding
If a chain has timeout=5.0 on the LLM and you wrap it with asyncio.wait_for(chain.ainvoke(...), timeout=10.0), which timeout takes precedence if the LLM API never responds, and why?
Show answer hint
The HTTP timeout (5.0 seconds) fires first because it is the innermost layer: the socket read will fail after 5 seconds, raising an exception that propagates up. The chain-level timeout (10.0) is never reached in this case. Both timeouts are useful: HTTP timeout protects individual requests; chain timeout protects entire compound operations that may make multiple calls.
Chain.ainvoke() did not exist; use Chain.arun() instead (deprecated in 0.3.x). Also, timeout parameter names changed in langchain-openai >= 1.0.0: older versions used request_timeout; verify your installed version with pip show langchain-openai.