How to set timeout in LiteLLM
Quick answer
In
LiteLLM, set the timeout by passing the timeout parameter (in seconds) when creating the client or calling the inference method. This controls how long the client waits for a response before raising a timeout error. ERROR TYPE
config_error ⚡ QUICK FIX
Add the
timeout parameter with a suitable value (e.g., 30 seconds) when initializing the LiteLLM client or invoking inference.Why this happens
By default, LiteLLM may not have a timeout set for API calls or local inference requests. Without a timeout, calls can hang indefinitely if the model server is slow or unresponsive, causing your application to freeze or crash.
Example of code without timeout:
from litellm import Client
client = Client(model="llama3.2")
response = client.chat("Hello")
print(response) output
Hangs indefinitely if the server does not respond
The fix
Set the timeout parameter (in seconds) when creating the Client or when calling chat. This ensures the call will raise a TimeoutError if the response takes longer than the specified time.
This works because the underlying HTTP or IPC client respects the timeout and aborts stalled requests.
from litellm import Client
client = Client(model="llama3.2", timeout=30) # 30 seconds timeout
try:
response = client.chat("Hello")
print(response)
except TimeoutError:
print("Request timed out after 30 seconds") output
Hello! How can I assist you today?
Preventing it in production
- Always set a reasonable
timeoutto avoid indefinite hangs. - Implement retry logic with exponential backoff to handle transient timeouts gracefully.
- Monitor latency and error rates to adjust timeout values dynamically.
- Use circuit breakers or fallback responses to maintain user experience during outages.
Key Takeaways
- Always specify a
timeoutin LiteLLM client or method calls to avoid hanging requests. - Use try-except blocks to catch
TimeoutErrorand handle it gracefully. - Combine timeout with retry and fallback strategies for robust production AI apps.