How to handle errors in LlamaIndex
LlamaIndex by catching exceptions such as RateLimitError or APIConnectionError during index queries or API calls. Implement retry logic with exponential backoff and validate inputs to prevent common failures.api_error LlamaIndex API calls to handle RateLimitError automatically.Why this happens
LlamaIndex interacts with external LLM APIs which can fail due to rate limits, network issues, or invalid inputs. For example, calling index.query() without handling exceptions can raise RateLimitError or APIConnectionError. These errors occur when the API quota is exceeded or connectivity is unstable.
Typical error output looks like:
openai.error.RateLimitError: You have exceeded your current quota, please check your plan and billing details.
from llama_index import GPTSimpleVectorIndex
index = GPTSimpleVectorIndex.load_from_disk('index.json')
# This call may raise exceptions if API limits are hit
response = index.query('What is AI?')
print(response) openai.error.RateLimitError: You have exceeded your current quota, please check your plan and billing details.
The fix
Wrap your index.query() calls in try-except blocks and implement exponential backoff retries to handle transient API errors gracefully. This prevents your app from crashing and respects API rate limits.
import time
from llama_index import GPTSimpleVectorIndex
from openai import RateLimitError, APIConnectionError
import os
index = GPTSimpleVectorIndex.load_from_disk('index.json')
max_retries = 5
retry_delay = 1 # seconds
for attempt in range(max_retries):
try:
response = index.query('What is AI?')
print(response)
break
except (RateLimitError, APIConnectionError) as e:
print(f'API error: {e}, retrying in {retry_delay} seconds...')
time.sleep(retry_delay)
retry_delay *= 2 # exponential backoff
else:
print('Failed after retries') API error: You have exceeded your current quota, please check your plan and billing details., retrying in 1 seconds... <retries> <Response text from index.query()>
Preventing it in production
Use robust retry mechanisms with capped exponential backoff and jitter to avoid thundering herd problems. Validate query inputs before sending to LlamaIndex to catch malformed requests early. Monitor API usage and implement fallback logic such as cached responses or degraded modes when errors persist.
Consider wrapping LlamaIndex calls in utility functions that centralize error handling and logging for easier maintenance.
Key Takeaways
- Always catch
RateLimitErrorandAPIConnectionErrorwhen callingLlamaIndexAPIs. - Implement exponential backoff retries to handle transient API failures gracefully.
- Validate inputs before querying to prevent avoidable errors.
- Centralize error handling and logging for maintainable production code.