Code beginner · 3 min read

How to generate text with Gemini API in python

Q: How to generate text with Gemini API in python

Use the Gemini API by importing the OpenAI client from openai, initializing it with your API key from os.environ, and calling client.chat.completions.create with model gemini-1.5-pro and your messages.

Direct answer

Use the Gemini API by importing the OpenAI client from openai, initializing it with your API key from os.environ, and calling client.chat.completions.create with model gemini-1.5-pro and your messages.

Setup

Install

bash

pip install openai

Env vars

OPENAI_API_KEY

Imports

python

import os
from openai import OpenAI

Examples

inHello, how are you?

outI'm doing great, thank you! How can I assist you today?

inWrite a short poem about spring.

outSpring whispers softly, blooms awake anew, Colors dance in sunlight, skies painted blue.

inExplain quantum computing in simple terms.

outQuantum computing uses tiny particles that can be in many states at once, helping solve complex problems faster than regular computers.

Integration steps

Install the OpenAI Python SDK and set your API key in the environment variable OPENAI_API_KEY.
Import the OpenAI client and initialize it with your API key from os.environ.
Create a messages list with a user role and your prompt content.
Call client.chat.completions.create with model 'gemini-1.5-pro' and the messages list.
Extract the generated text from response.choices[0].message.content.
Use or display the generated text as needed.

Full code

python

import os
from openai import OpenAI

# Initialize client with API key from environment
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# Define the prompt message
messages = [{"role": "user", "content": "Write a short story about a robot learning to love."}]

# Call the Gemini model for text generation
response = client.chat.completions.create(
    model="gemini-1.5-pro",
    messages=messages
)

# Extract and print the generated text
text = response.choices[0].message.content
print("Generated Text:\n", text)

output

Generated Text:
Once upon a time, in a world of circuits and code, a robot named Aria discovered the warmth of friendship and the beauty of love...

API trace

Request

json

{"model": "gemini-1.5-pro", "messages": [{"role": "user", "content": "Write a short story about a robot learning to love."}]}

Response

json

{"choices": [{"message": {"content": "Once upon a time, in a world of circuits and code..."}}], "usage": {"total_tokens": 150}}

Extractresponse.choices[0].message.content

Variants

Streaming Text Generation ›

Use streaming when you want to display the text as it is generated for a more interactive user experience.

python

import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

messages = [{"role": "user", "content": "Tell me a joke."}]

# Stream the response for better UX
response = client.chat.completions.create(
    model="gemini-1.5-pro",
    messages=messages,
    stream=True
)

for chunk in response:
    print(chunk.choices[0].delta.get('content', ''), end='')
print()

Async Text Generation ›

Use async calls to handle multiple concurrent requests efficiently in asynchronous Python applications.

python

import os
import asyncio
from openai import OpenAI

async def generate_text():
    client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
    messages = [{"role": "user", "content": "Explain AI in simple terms."}]
    response = await client.chat.completions.acreate(
        model="gemini-1.5-pro",
        messages=messages
    )
    print(response.choices[0].message.content)

asyncio.run(generate_text())

Using Gemini Flash Model ›

Use the 'gemini-1.5-flash' model for faster responses with slightly lower latency and cost.

python

import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

messages = [{"role": "user", "content": "Summarize the latest tech news."}]

response = client.chat.completions.create(
    model="gemini-1.5-flash",
    messages=messages
)

print(response.choices[0].message.content)

Performance

Latency~700ms for gemini-1.5-pro non-streaming calls

Cost~$0.003 per 500 tokens generated

Rate limitsTier 1: 600 RPM / 40K TPM

Keep prompts concise to reduce token usage.
Use shorter system instructions when possible.
Reuse context efficiently to avoid repeating tokens.

Approach	Latency	Cost/call	Best for
Standard call (gemini-1.5-pro)	~700ms	~$0.003	General purpose text generation
Streaming call	Starts immediately, total ~700ms	~$0.003	Interactive applications needing partial output
Async call	~700ms	~$0.003	Concurrent requests in async apps
Gemini Flash model	~400ms	~$0.002	Faster, cost-effective responses

✓

Quick tip

Always set your API key in the environment variable and use the latest Gemini model name for best results.

⚠

Common mistake

Beginners often forget to pass the messages as a list of dictionaries with roles, causing API errors.

Verified 2026-04 · gemini-1.5-pro, gemini-1.5-flash

Verify ↗