How to beginner · 3 min read

How to call Gemini with LiteLLM

Q: How to call Gemini with LiteLLM

Use the litellm Python client to call Gemini models by specifying the model name (e.g., gemini-2.5-pro) in the Client constructor. Then invoke client.chat() with your messages to get completions from Gemini.

Quick answer

Use the litellm Python client to call Gemini models by specifying the model name (e.g., gemini-2.5-pro) in the Client constructor. Then invoke client.chat() with your messages to get completions from Gemini.

PREREQUISITES

Python 3.8+
pip install litellm
Google Cloud API key or environment configured for Gemini access

Setup

Install the litellm Python package and set your Google Cloud credentials to access Gemini models.

Run the following command to install:

bash

pip install litellm

Step by step

Here is a complete example to call Gemini's gemini-2.5-pro model using litellm. Replace YOUR_API_KEY with your actual Google Cloud API key or ensure your environment is configured for authentication.

python

from litellm import Client
import os

# Initialize the LiteLLM client with Gemini model
client = Client(model="gemini-2.5-pro", api_key=os.environ["GOOGLE_API_KEY"])

# Define the chat messages
messages = [
    {"role": "user", "content": "Write a Python function to reverse a string."}
]

# Call Gemini via LiteLLM
response = client.chat(messages=messages)

# Print the assistant's reply
print(response.choices[0].message.content)

output

def reverse_string(s):
    return s[::-1]

Common variations

You can call different Gemini models by changing the model parameter, for example gemini-2.0-flash for faster responses or gemini-1.5-pro for smaller tasks.

LiteLLM also supports async calls:

python

import asyncio
from litellm import Client
import os

async def async_call():
    client = Client(model="gemini-2.5-pro", api_key=os.environ["GOOGLE_API_KEY"])
    messages = [{"role": "user", "content": "Explain recursion in simple terms."}]
    response = await client.chat_async(messages=messages)
    print(response.choices[0].message.content)

asyncio.run(async_call())

output

Recursion is when a function calls itself to solve smaller parts of a problem until it reaches a base case.

Troubleshooting

If you get authentication errors, verify your GOOGLE_API_KEY environment variable is set correctly.
If the model is not found, confirm you have access to the specified Gemini model and the model name is correct.
For network issues, check your internet connection and firewall settings.

✅

Key Takeaways

Use litellm.Client with the model parameter set to a Gemini model name to call Gemini.
Set your Google Cloud API key in GOOGLE_API_KEY environment variable for authentication.
LiteLLM supports both synchronous chat() and asynchronous chat_async() calls.
Switch Gemini models easily by changing the model string in the client constructor.
Check environment variables and model access if you encounter errors.

Verified 2026-04 · gemini-2.5-pro, gemini-2.0-flash, gemini-1.5-pro

Verify ↗