Code beginner · 3 min read

How to authenticate with Gemini API in python

Direct answer

Authenticate with the Gemini API in Python by initializing the Gemini client with your API key stored in os.environ and passing it to the client constructor.

Setup

Install

bash

pip install google-ai-gemini

Env vars

GEMINI_API_KEY

Imports

python

import os
from google.ai import gemini
from google.ai.gemini import GeminiClient

Examples

inSend a simple prompt 'Hello, Gemini!'

outGemini response: Hello, Gemini! How can I assist you today?

inAsk Gemini to summarize a text about AI advancements

outGemini response: AI advancements have accelerated rapidly, enabling new applications in healthcare, finance, and more.

inSend an empty prompt

outGemini response: Please provide a valid input to proceed.

Integration steps

Set your Gemini API key in the environment variable GEMINI_API_KEY
Import the Gemini client and os modules
Initialize the Gemini client with the API key from os.environ
Create a request with the desired model and prompt
Call the Gemini API to get a completion
Extract and use the response text from the API output

Full code

python

import os
from google.ai import gemini
from google.ai.gemini import GeminiClient

def main():
    api_key = os.environ.get("GEMINI_API_KEY")
    if not api_key:
        raise ValueError("GEMINI_API_KEY environment variable not set")

    client = GeminiClient(api_key=api_key)

    request = gemini.CompletionRequest(
        model="gemini-1.5-pro",
        prompt="Hello, Gemini!",
        max_tokens=100
    )

    response = client.complete(request)
    print("Gemini response:", response.choices[0].text)

if __name__ == "__main__":
    main()

output

Gemini response: Hello, Gemini! How can I assist you today?

API trace

Request

json

{"model": "gemini-1.5-pro", "prompt": "Hello, Gemini!", "max_tokens": 100}

Response

json

{"choices": [{"text": "Hello, Gemini! How can I assist you today?"}], "usage": {"total_tokens": 25}}

Extractresponse.choices[0].text

Variants

Streaming response ›

Use streaming for real-time token-by-token output to improve user experience on long responses.

python

import os
from google.ai import gemini
from google.ai.gemini import GeminiClient

def main():
    api_key = os.environ.get("GEMINI_API_KEY")
    if not api_key:
        raise ValueError("GEMINI_API_KEY environment variable not set")

    client = GeminiClient(api_key=api_key)

    request = gemini.CompletionRequest(
        model="gemini-1.5-pro",
        prompt="Stream this response.",
        max_tokens=100,
        stream=True
    )

    for chunk in client.stream_complete(request):
        print(chunk.text, end='', flush=True)

if __name__ == "__main__":
    main()

Async authentication and completion ›

Use async calls when integrating Gemini API in applications requiring concurrency or non-blocking IO.

python

import os
import asyncio
from google.ai import gemini
from google.ai.gemini import GeminiClient

async def main():
    api_key = os.environ.get("GEMINI_API_KEY")
    if not api_key:
        raise ValueError("GEMINI_API_KEY environment variable not set")

    client = GeminiClient(api_key=api_key)

    request = gemini.CompletionRequest(
        model="gemini-1.5-pro",
        prompt="Async hello!",
        max_tokens=100
    )

    response = await client.complete_async(request)
    print("Gemini async response:", response.choices[0].text)

if __name__ == "__main__":
    asyncio.run(main())

Use alternative model gemini-2.0-flash ›

Use the gemini-2.0-flash model for faster, lower-latency completions with slightly reduced context size.

python

import os
from google.ai import gemini
from google.ai.gemini import GeminiClient

def main():
    api_key = os.environ.get("GEMINI_API_KEY")
    if not api_key:
        raise ValueError("GEMINI_API_KEY environment variable not set")

    client = GeminiClient(api_key=api_key)

    request = gemini.CompletionRequest(
        model="gemini-2.0-flash",
        prompt="Use the flash model for faster response.",
        max_tokens=100
    )

    response = client.complete(request)
    print("Gemini flash model response:", response.choices[0].text)

if __name__ == "__main__":
    main()

Performance

Latency~700ms for gemini-1.5-pro non-streaming completion

Cost~$0.003 per 500 tokens for gemini-1.5-pro

Rate limitsTier 1: 600 requests per minute, 40,000 tokens per minute

Use concise prompts to reduce token usage
Limit max_tokens to only what you need
Reuse context when possible to avoid repeated tokens

Approach	Latency	Cost/call	Best for
Standard completion	~700ms	~$0.003	General purpose completions
Streaming completion	Starts immediately, total ~700ms	~$0.003	Real-time UI updates
Async completion	~700ms (non-blocking)	~$0.003	Concurrent or async apps
gemini-2.0-flash model	~400ms	~$0.002	Low latency, faster responses

✓

Quick tip

Always store your Gemini API key securely in environment variables and never hardcode it in your source code.

⚠

Common mistake

A common mistake is forgetting to set the GEMINI_API_KEY environment variable, causing authentication failures.

Verified 2026-04 · gemini-1.5-pro, gemini-2.0-flash

Verify ↗