Debug Fix beginner · 3 min read

Vertex AI error codes reference

Quick answer
Common Vertex AI error codes include 400 Bad Request for invalid inputs, 401 Unauthorized for authentication failures, 403 Forbidden for permission issues, 429 Too Many Requests for rate limiting, and 500 Internal Server Error for server-side problems. Handling these requires proper request validation, authentication setup, and retry logic.
ERROR TYPE api_error
⚡ QUICK FIX
Add exponential backoff retry logic around your API call to handle RateLimitError automatically.

Why this happens

Vertex AI API errors occur due to invalid requests, authentication failures, permission restrictions, rate limits, or internal server issues. For example, sending malformed JSON or missing required parameters triggers 400 Bad Request. Using expired or missing credentials causes 401 Unauthorized. Calling APIs without proper IAM roles results in 403 Forbidden. Exceeding quota limits leads to 429 Too Many Requests. Server errors return 500 Internal Server Error.

Typical error output from the API looks like:

{
  "error": {
    "code": 429,
    "message": "Quota exceeded",
    "status": "RESOURCE_EXHAUSTED"
  }
}
python
import vertexai
from vertexai.generative_models import GenerativeModel

vertexai.init(project=os.environ["GOOGLE_CLOUD_PROJECT"], location="us-central1")
model = GenerativeModel("gemini-2.0-flash")

# Example of a malformed request causing 400 error
response = model.generate_content("")  # Empty prompt triggers error
output
google.api_core.exceptions.BadRequest: 400 Bad Request: Request contains an invalid argument.

The fix

Validate inputs before sending requests, ensure authentication is correctly configured with Application Default Credentials or service account keys, and verify IAM permissions. Implement exponential backoff retry logic for 429 Too Many Requests errors to handle rate limits gracefully.

Example code with retry and error handling:

python
import time
import os
import vertexai
from vertexai.generative_models import GenerativeModel
from google.api_core.exceptions import TooManyRequests, Unauthorized, Forbidden, BadRequest

vertexai.init(project=os.environ["GOOGLE_CLOUD_PROJECT"], location="us-central1")
model = GenerativeModel("gemini-2.0-flash")

prompt = "Explain quantum computing"

max_retries = 5
for attempt in range(max_retries):
    try:
        response = model.generate_content(prompt)
        print(response.text)
        break
    except TooManyRequests:
        wait_time = 2 ** attempt
        print(f"Rate limit hit, retrying in {wait_time} seconds...")
        time.sleep(wait_time)
    except (Unauthorized, Forbidden) as auth_err:
        print(f"Authentication or permission error: {auth_err}")
        break
    except BadRequest as bad_req:
        print(f"Invalid request: {bad_req}")
        break
    except Exception as e:
        print(f"Unexpected error: {e}")
        break
output
Explain quantum computing in simple terms...

Preventing it in production

  • Use robust input validation to avoid 400 Bad Request.
  • Configure authentication with service accounts or Application Default Credentials correctly to prevent 401 Unauthorized.
  • Assign proper IAM roles to avoid 403 Forbidden.
  • Implement exponential backoff retries for 429 Too Many Requests to handle quota limits.
  • Monitor API usage and set alerts for quota exhaustion.
  • Use fallback models or cached responses to maintain availability during 500 Internal Server Error incidents.

Key Takeaways

  • Validate all request inputs to prevent 400 errors in Vertex AI API calls.
  • Configure authentication and IAM permissions correctly to avoid 401 and 403 errors.
  • Implement exponential backoff retries to handle 429 rate limit errors gracefully.
Verified 2026-04 · gemini-2.0-flash
Verify ↗