TimeoutError
google.api_core.exceptions.DeadlineExceeded
Stack trace
google.api_core.exceptions.DeadlineExceeded: Deadline Exceeded: 60 seconds exceeded while waiting for response from streaming generate_content
File "/usr/local/lib/python3.9/site-packages/google/cloud/aiplatform_v1/services/endpoint/client.py", line 1234, in streaming_generate_content
response = self._transport.streaming_generate_content(request, timeout=timeout)
File "/usr/local/lib/python3.9/site-packages/google/api_core/grpc_helpers.py", line 70, in error_remapped_callable
return callable_(*args, **kwargs)
File "/usr/local/lib/python3.9/site-packages/grpc/_channel.py", line 1026, in __call__
raise _InactiveRpcError(state)
Why it happens
The streaming generate_content method in Vertex AI has a default or configured timeout that was exceeded because the model took too long to respond or the network was slow. This causes the client to raise a DeadlineExceeded timeout error.
Detection
Monitor the client call duration and catch google.api_core.exceptions.DeadlineExceeded exceptions to log when streaming generate_content calls exceed their timeout.
Causes & fixes
The default timeout for streaming_generate_content is too short for the model response time.
Increase the timeout parameter in the streaming_generate_content call to a higher value that accommodates longer response times.
Network latency or connectivity issues causing delayed streaming responses.
Check network stability and retry the request with exponential backoff on timeout errors.
The model is generating a very large or complex response causing delays.
Optimize the prompt or request parameters to reduce response size or complexity, or increase timeout accordingly.
Code: broken vs fixed
from google.cloud import aiplatform
client = aiplatform.gapic.EndpointServiceClient()
request = {
"endpoint": "projects/my-project/locations/us-central1/endpoints/1234567890",
"instances": [{"content": "Hello"}]
}
# This line causes the timeout error due to default short timeout
response = client.streaming_generate_content(request=request) # TimeoutError here
print(response) import os
from google.cloud import aiplatform
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "/path/to/your/service-account.json"
client = aiplatform.gapic.EndpointServiceClient()
request = {
"endpoint": "projects/my-project/locations/us-central1/endpoints/1234567890",
"instances": [{"content": "Hello"}]
}
# Increased timeout to 120 seconds to prevent timeout error
response = client.streaming_generate_content(request=request, timeout=120)
print(response) Workaround
Wrap the streaming_generate_content call in a try/except block catching DeadlineExceeded, then retry the request with an increased timeout or after a short delay.
Prevention
Implement robust retry logic with exponential backoff and configure appropriate timeout values based on expected model response times to prevent streaming generate_content timeouts.