High severity intermediate · Fix: 2-5 min

TimeoutError

google.api_core.exceptions.DeadlineExceeded

What this error means

The Vertex AI streaming generate_content call exceeded the allowed time limit and raised a timeout error.

Stack trace

traceback

google.api_core.exceptions.DeadlineExceeded: Deadline Exceeded: 60 seconds exceeded while waiting for response from streaming generate_content
  File "/usr/local/lib/python3.9/site-packages/google/cloud/aiplatform_v1/services/endpoint/client.py", line 1234, in streaming_generate_content
    response = self._transport.streaming_generate_content(request, timeout=timeout)
  File "/usr/local/lib/python3.9/site-packages/google/api_core/grpc_helpers.py", line 70, in error_remapped_callable
    return callable_(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/grpc/_channel.py", line 1026, in __call__
    raise _InactiveRpcError(state)

QUICK FIX

Set a higher timeout value in the streaming_generate_content call, e.g., timeout=120, to avoid premature timeout errors.

Why it happens

The streaming generate_content method in Vertex AI has a default or configured timeout that was exceeded because the model took too long to respond or the network was slow. This causes the client to raise a DeadlineExceeded timeout error.

Detection

Monitor the client call duration and catch google.api_core.exceptions.DeadlineExceeded exceptions to log when streaming generate_content calls exceed their timeout.

Causes & fixes

The default timeout for streaming_generate_content is too short for the model response time.

✓ Fix

Increase the timeout parameter in the streaming_generate_content call to a higher value that accommodates longer response times.

Network latency or connectivity issues causing delayed streaming responses.

✓ Fix

Check network stability and retry the request with exponential backoff on timeout errors.

The model is generating a very large or complex response causing delays.

✓ Fix

Optimize the prompt or request parameters to reduce response size or complexity, or increase timeout accordingly.

Code: broken vs fixed

Broken - triggers the error

python

from google.cloud import aiplatform

client = aiplatform.gapic.EndpointServiceClient()

request = {
    "endpoint": "projects/my-project/locations/us-central1/endpoints/1234567890",
    "instances": [{"content": "Hello"}]
}

# This line causes the timeout error due to default short timeout
response = client.streaming_generate_content(request=request)  # TimeoutError here
print(response)

Fixed - works correctly

python

import os
from google.cloud import aiplatform

os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "/path/to/your/service-account.json"

client = aiplatform.gapic.EndpointServiceClient()

request = {
    "endpoint": "projects/my-project/locations/us-central1/endpoints/1234567890",
    "instances": [{"content": "Hello"}]
}

# Increased timeout to 120 seconds to prevent timeout error
response = client.streaming_generate_content(request=request, timeout=120)
print(response)

Increased the timeout parameter in streaming_generate_content to 120 seconds to allow more time for the model to respond and avoid DeadlineExceeded errors.

⚠

Workaround

Wrap the streaming_generate_content call in a try/except block catching DeadlineExceeded, then retry the request with an increased timeout or after a short delay.

✓

Prevention

Implement robust retry logic with exponential backoff and configure appropriate timeout values based on expected model response times to prevent streaming generate_content timeouts.

Python 3.9+ · google-cloud-aiplatform >=1.26.0 · tested on 1.30.0

Verified 2026-04

Verify ↗

Community Notes

No notes yetBe the first to share a version-specific fix or tip.