How to use Gemini API with LlamaIndex
Quick answer
Use the
gemini-1.5-pro model via the OpenAI SDK v1 to generate embeddings or completions, then pass these outputs to LlamaIndex for indexing and querying. Initialize the OpenAI client with your API key, call the Gemini model, and integrate results into LlamaIndex data structures.PREREQUISITES
Python 3.8+Google Cloud or OpenAI API key with Gemini accesspip install openai>=1.0 llama-index>=0.6.0
Setup
Install the required Python packages and set your environment variable for the API key.
- Install packages:
pip install openai llama-index - Set environment variable:
export OPENAI_API_KEY='your_api_key_here'(Linux/macOS) orsetx OPENAI_API_KEY "your_api_key_here"(Windows)
pip install openai llama-index Step by step
This example demonstrates how to use the Gemini API with LlamaIndex to create a simple document index and query it.
import os
from openai import OpenAI
from llama_index import SimpleDirectoryReader, GPTVectorStoreIndex, ServiceContext
# Initialize OpenAI client with Gemini model
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
def query_gemini_with_llamaindex():
# Load documents from a local directory
documents = SimpleDirectoryReader("./data").load_data()
# Create a service context with a custom embed model using Gemini
def embed_text(texts):
# Call Gemini embedding endpoint
response = client.embeddings.create(
model="gemini-1.5-pro",
input=texts
)
return [embedding.embedding for embedding in response.data]
service_context = ServiceContext.from_defaults(embed_model=embed_text)
# Build the index
index = GPTVectorStoreIndex.from_documents(documents, service_context=service_context)
# Query the index
query = "What is the main topic of the documents?"
response = index.query(query)
print("Query response:", response.response)
if __name__ == "__main__":
query_gemini_with_llamaindex() output
Query response: The main topic of the documents is ...
Common variations
You can customize the integration by:
- Using
gemini-2.0-flashfor faster responses. - Switching to async calls with
asyncioandOpenAIasync client. - Streaming responses for real-time output.
- Using other LlamaIndex index types like
GPTListIndexorGPTTreeIndex.
import asyncio
import os
from openai import OpenAI
async def async_gemini_call():
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
response = await client.chat.completions.acreate(
model="gemini-2.0-flash",
messages=[{"role": "user", "content": "Hello from async Gemini!"}]
)
print(response.choices[0].message.content)
if __name__ == "__main__":
asyncio.run(async_gemini_call()) output
Hello from async Gemini!
Troubleshooting
If you encounter authentication errors, verify your OPENAI_API_KEY environment variable is set correctly. For rate limits, consider retrying with exponential backoff. If embeddings fail, ensure your input texts are not empty or too large.
Key Takeaways
- Use the OpenAI SDK v1 with
gemini-1.5-profor embedding generation in LlamaIndex. - Wrap Gemini API calls in a custom embedding function to integrate with LlamaIndex's service context.
- Async and streaming variants improve responsiveness and scalability.
- Always set your API key in
os.environto avoid authentication issues.