API Beginner easy · 5 min

Timeout errors: large responses

What you will learn

Set explicit timeout and max_tokens limits when requesting large completions from OpenAI to prevent network hangs and unexpected failures.

Why this matters

Large API responses can exceed default timeout windows, leaving your application frozen waiting for a response that may never arrive. Understanding timeout configuration prevents production incidents where requests hang indefinitely.

Skip if: When working with small, predictable responses (under 500 tokens) in development environments. However, production code should always specify timeouts regardless of expected response size: network conditions are unpredictable.

Explanation

The OpenAI Python SDK enforces request timeouts to prevent indefinite blocking. By default, the client uses a 600-second (10 minute) timeout for most operations. When you request large completions: particularly with max_tokens set high or streaming disabled: the response can take longer than this window, causing a httpx.TimeoutException.

Under the hood, the timeout clock starts when the request leaves your machine and stops when the final response byte arrives at your client. This includes network latency, OpenAI's processing time, and token generation time. If your completion requires generating 3000+ tokens, the API must run inference for longer, increasing timeout risk.

The fix is twofold: (1) Set an explicit timeout parameter when instantiating the OpenAI client, and (2) cap max_tokens to a reasonable value for your use case. For example, a 30-second timeout works for most chat completions; a 120-second timeout is safer for longer generations.

Request code

python

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.getenv('OPENAI_API_KEY'),
    timeout=30.0
)

try:
    response = client.chat.completions.create(
        model='gpt-4o-mini',
        messages=[
            {
                'role': 'user',
                'content': 'Write a 500-word essay on artificial intelligence.'
            }
        ],
        max_tokens=1500,
        temperature=0.7
    )
    print(f'Response: {response.choices[0].message.content}')
    print(f'Tokens used: {response.usage.total_tokens}')
except Exception as e:
    print(f'Error occurred: {type(e).__name__}: {str(e)}')

Authentication

Ensure your OPENAI_API_KEY environment variable is set before running the code. The SDK reads this at client instantiation time: `export OPENAI_API_KEY=sk-... && python your_script.py`

Response shape

Field	Description
`id`	chatcmpl-..., unique identifier for this completion
`object`	text_completion, response type marker
`created`	1706123456, Unix timestamp when response was generated
`model`	gpt-4o-mini, the model that processed the request
`choices`	[object Object]
`usage`	[object Object]

Field guide

finish_reason

Critical field: 'stop' means the model finished naturally; 'length' means max_tokens was hit and generation was cut short. Check this before assuming you have a complete response.

usage.total_tokens

What determines your billing: every token here costs money. Monitor this to understand your API spending.

id

Useful for debugging or logging: include this in error reports to OpenAI support

Setup trap

The timeout is set at client instantiation, not per-request. If you create the client without a timeout, all subsequent calls inherit that setting. Developers often create the client once at module load time, then forget timeout is configured (or not configured). Always verify: `print(client.timeout)` after instantiation.

Cost

Each failed timeout attempt still counts toward your request quota and rate limits, even though no tokens were generated. A sustained series of timeouts wastes your rate limit window without producing output. In high-volume systems, timeouts directly translate to wasted API quota and degraded throughput.

Rate limits

If you're hitting rate limits (429 errors), aggressive timeouts can worsen the situation. When rate-limited, requests queue on OpenAI's servers, making them take longer. A 30-second timeout may fail on the first retry of a rate-limited request. Use exponential backoff with generous timeouts (60-120 seconds) in production.

Common gotcha

Setting timeout=None disables the timeout completely, which seems like a solution but creates a new problem: your application can hang forever if the network breaks mid-response. Always set a finite timeout, even if it's large (e.g., 300 seconds for very long generations). Also, a 30-second timeout is too aggressive for responses over 1000 tokens: increase to 60-120 seconds for safety.

Error recovery

httpx.TimeoutException

Request exceeded the timeout window. Increase timeout value (e.g., from 30 to 90 seconds) or reduce max_tokens. If response is truly needed, retry with backoff: `import time; time.sleep(2 ** attempt); client.chat.completions.create(...)`

APIConnectionError

Network connectivity issue, not a timeout problem. Check internet connection and API endpoint availability. Verify OPENAI_API_KEY is valid and not revoked.

RateLimitError

You've exceeded request quota. Wait before retrying (hint from error response tells you how long). Use exponential backoff with longer timeouts to absorb queueing delays.

Experienced dev note

Timeout configuration is invisible until it fails catastrophically. In production, set timeout to 2-3x your expected response time (e.g., if responses average 15 seconds, use 45 seconds). Monitor logs for actual response times: log `time_elapsed = response.created - time.time()` to measure real-world performance. This data drives smarter timeout tuning. Also: never timeout in CI/CD pipelines: tests may be slow due to shared runners. Use a much higher timeout (300 seconds) in test code.

Check your understanding

Your application generates legal documents with max_tokens=2000. In production, 95% of requests complete in 8 seconds, but the tail 5% take 20-30 seconds. A teammate proposes setting timeout=10 to 'keep things snappy.' Why is this dangerous, and what timeout would you set instead?

Show answer hint

You need to account for network variability and the full tail of response times, not just the median. A 10-second timeout will fail the slow 5% of requests unnecessarily, degrading availability. Set it higher than your observed p99 latency (maybe 45-60 seconds) to catch only true failures, not normal slowness.

VERSION OpenAI SDK 1.x uses httpx for HTTP transport. Timeout is passed as a float (seconds) to the OpenAI constructor. In OpenAI SDK 0.x (deprecated), timeout handling was less explicit. Always confirm you're on SDK 1.3+ with `pip show openai` to ensure timeout behavior is correct.

Community Notes

No notes yetBe the first to share a version-specific fix or tip.