Vertex AI error codes reference
api_error Why this happens
Vertex AI API errors occur due to invalid requests, authentication failures, permission restrictions, rate limits, or internal server issues. For example, sending malformed JSON or missing required parameters triggers 400 Bad Request. Using expired or missing credentials causes 401 Unauthorized. Calling APIs without proper IAM roles results in 403 Forbidden. Exceeding quota limits leads to 429 Too Many Requests. Server errors return 500 Internal Server Error.
Typical error output from the API looks like:
{
"error": {
"code": 429,
"message": "Quota exceeded",
"status": "RESOURCE_EXHAUSTED"
}
}import vertexai
from vertexai.generative_models import GenerativeModel
vertexai.init(project=os.environ["GOOGLE_CLOUD_PROJECT"], location="us-central1")
model = GenerativeModel("gemini-2.0-flash")
# Example of a malformed request causing 400 error
response = model.generate_content("") # Empty prompt triggers error google.api_core.exceptions.BadRequest: 400 Bad Request: Request contains an invalid argument.
The fix
Validate inputs before sending requests, ensure authentication is correctly configured with Application Default Credentials or service account keys, and verify IAM permissions. Implement exponential backoff retry logic for 429 Too Many Requests errors to handle rate limits gracefully.
Example code with retry and error handling:
import time
import os
import vertexai
from vertexai.generative_models import GenerativeModel
from google.api_core.exceptions import TooManyRequests, Unauthorized, Forbidden, BadRequest
vertexai.init(project=os.environ["GOOGLE_CLOUD_PROJECT"], location="us-central1")
model = GenerativeModel("gemini-2.0-flash")
prompt = "Explain quantum computing"
max_retries = 5
for attempt in range(max_retries):
try:
response = model.generate_content(prompt)
print(response.text)
break
except TooManyRequests:
wait_time = 2 ** attempt
print(f"Rate limit hit, retrying in {wait_time} seconds...")
time.sleep(wait_time)
except (Unauthorized, Forbidden) as auth_err:
print(f"Authentication or permission error: {auth_err}")
break
except BadRequest as bad_req:
print(f"Invalid request: {bad_req}")
break
except Exception as e:
print(f"Unexpected error: {e}")
break Explain quantum computing in simple terms...
Preventing it in production
- Use robust input validation to avoid 400 Bad Request.
- Configure authentication with service accounts or Application Default Credentials correctly to prevent 401 Unauthorized.
- Assign proper IAM roles to avoid 403 Forbidden.
- Implement exponential backoff retries for 429 Too Many Requests to handle quota limits.
- Monitor API usage and set alerts for quota exhaustion.
- Use fallback models or cached responses to maintain availability during 500 Internal Server Error incidents.
Key Takeaways
- Validate all request inputs to prevent 400 errors in Vertex AI API calls.
- Configure authentication and IAM permissions correctly to avoid 401 and 403 errors.
- Implement exponential backoff retries to handle 429 rate limit errors gracefully.