Timeout errors: large responses
Why this matters
Large API responses can exceed default timeout windows, leaving your application frozen waiting for a response that may never arrive. Understanding timeout configuration prevents production incidents where requests hang indefinitely.
Explanation
The OpenAI Python SDK enforces request timeouts to prevent indefinite blocking. By default, the client uses a 600-second (10 minute) timeout for most operations. When you request large completions: particularly with max_tokens set high or streaming disabled: the response can take longer than this window, causing a httpx.TimeoutException.
Under the hood, the timeout clock starts when the request leaves your machine and stops when the final response byte arrives at your client. This includes network latency, OpenAI's processing time, and token generation time. If your completion requires generating 3000+ tokens, the API must run inference for longer, increasing timeout risk.
The fix is twofold: (1) Set an explicit timeout parameter when instantiating the OpenAI client, and (2) cap max_tokens to a reasonable value for your use case. For example, a 30-second timeout works for most chat completions; a 120-second timeout is safer for longer generations.
Request code
import os
from openai import OpenAI
client = OpenAI(
api_key=os.getenv('OPENAI_API_KEY'),
timeout=30.0
)
try:
response = client.chat.completions.create(
model='gpt-4o-mini',
messages=[
{
'role': 'user',
'content': 'Write a 500-word essay on artificial intelligence.'
}
],
max_tokens=1500,
temperature=0.7
)
print(f'Response: {response.choices[0].message.content}')
print(f'Tokens used: {response.usage.total_tokens}')
except Exception as e:
print(f'Error occurred: {type(e).__name__}: {str(e)}') Authentication
Ensure your OPENAI_API_KEY environment variable is set before running the code. The SDK reads this at client instantiation time: `export OPENAI_API_KEY=sk-... && python your_script.py`
Response shape
| Field | Description |
|---|---|
id | chatcmpl-..., unique identifier for this completion |
object | text_completion, response type marker |
created | 1706123456, Unix timestamp when response was generated |
model | gpt-4o-mini, the model that processed the request |
choices | [object Object] |
usage | [object Object] |
Field guide
finish_reason Critical field: 'stop' means the model finished naturally; 'length' means max_tokens was hit and generation was cut short. Check this before assuming you have a complete response.
usage.total_tokens What determines your billing: every token here costs money. Monitor this to understand your API spending.
id Useful for debugging or logging: include this in error reports to OpenAI support
Setup trap
The timeout is set at client instantiation, not per-request. If you create the client without a timeout, all subsequent calls inherit that setting. Developers often create the client once at module load time, then forget timeout is configured (or not configured). Always verify: `print(client.timeout)` after instantiation.
Cost
Each failed timeout attempt still counts toward your request quota and rate limits, even though no tokens were generated. A sustained series of timeouts wastes your rate limit window without producing output. In high-volume systems, timeouts directly translate to wasted API quota and degraded throughput.
Rate limits
If you're hitting rate limits (429 errors), aggressive timeouts can worsen the situation. When rate-limited, requests queue on OpenAI's servers, making them take longer. A 30-second timeout may fail on the first retry of a rate-limited request. Use exponential backoff with generous timeouts (60-120 seconds) in production.
Common gotcha
Setting timeout=None disables the timeout completely, which seems like a solution but creates a new problem: your application can hang forever if the network breaks mid-response. Always set a finite timeout, even if it's large (e.g., 300 seconds for very long generations). Also, a 30-second timeout is too aggressive for responses over 1000 tokens: increase to 60-120 seconds for safety.
Error recovery
httpx.TimeoutExceptionAPIConnectionErrorRateLimitErrorExperienced dev note
Timeout configuration is invisible until it fails catastrophically. In production, set timeout to 2-3x your expected response time (e.g., if responses average 15 seconds, use 45 seconds). Monitor logs for actual response times: log `time_elapsed = response.created - time.time()` to measure real-world performance. This data drives smarter timeout tuning. Also: never timeout in CI/CD pipelines: tests may be slow due to shared runners. Use a much higher timeout (300 seconds) in test code.
Check your understanding
Your application generates legal documents with max_tokens=2000. In production, 95% of requests complete in 8 seconds, but the tail 5% take 20-30 seconds. A teammate proposes setting timeout=10 to 'keep things snappy.' Why is this dangerous, and what timeout would you set instead?
Show answer hint
You need to account for network variability and the full tail of response times, not just the median. A 10-second timeout will fail the slow 5% of requests unnecessarily, degrading availability. Set it higher than your observed p99 latency (maybe 45-60 seconds) to catch only true failures, not normal slowness.