How to intermediate · 3 min read

How to use Gemini API with LlamaIndex

Q: How to use Gemini API with LlamaIndex

Use the gemini-1.5-pro model via the OpenAI SDK v1 to generate embeddings or completions, then pass these outputs to LlamaIndex for indexing and querying. Initialize the OpenAI client with your API key, call the Gemini model, and integrate results into LlamaIndex data structures.

Quick answer

Use the gemini-1.5-pro model via the OpenAI SDK v1 to generate embeddings or completions, then pass these outputs to LlamaIndex for indexing and querying. Initialize the OpenAI client with your API key, call the Gemini model, and integrate results into LlamaIndex data structures.

PREREQUISITES

Python 3.8+
Google Cloud or OpenAI API key with Gemini access
pip install openai>=1.0 llama-index>=0.6.0

Setup

Install the required Python packages and set your environment variable for the API key.

Install packages: pip install openai llama-index
Set environment variable: export OPENAI_API_KEY='your_api_key_here' (Linux/macOS) or setx OPENAI_API_KEY "your_api_key_here" (Windows)

bash

pip install openai llama-index

Step by step

This example demonstrates how to use the Gemini API with LlamaIndex to create a simple document index and query it.

python

import os
from openai import OpenAI
from llama_index import SimpleDirectoryReader, GPTVectorStoreIndex, ServiceContext

# Initialize OpenAI client with Gemini model
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

def query_gemini_with_llamaindex():
    # Load documents from a local directory
    documents = SimpleDirectoryReader("./data").load_data()

    # Create a service context with a custom embed model using Gemini
    def embed_text(texts):
        # Call Gemini embedding endpoint
        response = client.embeddings.create(
            model="gemini-1.5-pro",
            input=texts
        )
        return [embedding.embedding for embedding in response.data]

    service_context = ServiceContext.from_defaults(embed_model=embed_text)

    # Build the index
    index = GPTVectorStoreIndex.from_documents(documents, service_context=service_context)

    # Query the index
    query = "What is the main topic of the documents?"
    response = index.query(query)
    print("Query response:", response.response)

if __name__ == "__main__":
    query_gemini_with_llamaindex()

output

Query response: The main topic of the documents is ...

Common variations

You can customize the integration by:

Using gemini-2.0-flash for faster responses.
Switching to async calls with asyncio and OpenAI async client.
Streaming responses for real-time output.
Using other LlamaIndex index types like GPTListIndex or GPTTreeIndex.

python

import asyncio
import os
from openai import OpenAI

async def async_gemini_call():
    client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
    response = await client.chat.completions.acreate(
        model="gemini-2.0-flash",
        messages=[{"role": "user", "content": "Hello from async Gemini!"}]
    )
    print(response.choices[0].message.content)

if __name__ == "__main__":
    asyncio.run(async_gemini_call())

output

Hello from async Gemini!

Troubleshooting

If you encounter authentication errors, verify your OPENAI_API_KEY environment variable is set correctly. For rate limits, consider retrying with exponential backoff. If embeddings fail, ensure your input texts are not empty or too large.

✅

Key Takeaways

Use the OpenAI SDK v1 with gemini-1.5-pro for embedding generation in LlamaIndex.
Wrap Gemini API calls in a custom embedding function to integrate with LlamaIndex's service context.
Async and streaming variants improve responsiveness and scalability.
Always set your API key in os.environ to avoid authentication issues.

Verified 2026-04 · gemini-1.5-pro, gemini-2.0-flash

Verify ↗