Code beginner · 3 min read

How to generate text with Gemini API in python

Direct answer
Use the Gemini API by importing the OpenAI client from openai, initializing it with your API key from os.environ, and calling client.chat.completions.create with model gemini-1.5-pro and your messages.

Setup

Install
bash
pip install openai
Env vars
OPENAI_API_KEY
Imports
python
import os
from openai import OpenAI

Examples

inHello, how are you?
outI'm doing great, thank you! How can I assist you today?
inWrite a short poem about spring.
outSpring whispers softly, blooms awake anew, Colors dance in sunlight, skies painted blue.
inExplain quantum computing in simple terms.
outQuantum computing uses tiny particles that can be in many states at once, helping solve complex problems faster than regular computers.

Integration steps

  1. Install the OpenAI Python SDK and set your API key in the environment variable OPENAI_API_KEY.
  2. Import the OpenAI client and initialize it with your API key from os.environ.
  3. Create a messages list with a user role and your prompt content.
  4. Call client.chat.completions.create with model 'gemini-1.5-pro' and the messages list.
  5. Extract the generated text from response.choices[0].message.content.
  6. Use or display the generated text as needed.

Full code

python
import os
from openai import OpenAI

# Initialize client with API key from environment
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# Define the prompt message
messages = [{"role": "user", "content": "Write a short story about a robot learning to love."}]

# Call the Gemini model for text generation
response = client.chat.completions.create(
    model="gemini-1.5-pro",
    messages=messages
)

# Extract and print the generated text
text = response.choices[0].message.content
print("Generated Text:\n", text)
output
Generated Text:
Once upon a time, in a world of circuits and code, a robot named Aria discovered the warmth of friendship and the beauty of love...

API trace

Request
json
{"model": "gemini-1.5-pro", "messages": [{"role": "user", "content": "Write a short story about a robot learning to love."}]}
Response
json
{"choices": [{"message": {"content": "Once upon a time, in a world of circuits and code..."}}], "usage": {"total_tokens": 150}}
Extractresponse.choices[0].message.content

Variants

Streaming Text Generation

Use streaming when you want to display the text as it is generated for a more interactive user experience.

python
import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

messages = [{"role": "user", "content": "Tell me a joke."}]

# Stream the response for better UX
response = client.chat.completions.create(
    model="gemini-1.5-pro",
    messages=messages,
    stream=True
)

for chunk in response:
    print(chunk.choices[0].delta.get('content', ''), end='')
print()
Async Text Generation

Use async calls to handle multiple concurrent requests efficiently in asynchronous Python applications.

python
import os
import asyncio
from openai import OpenAI

async def generate_text():
    client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
    messages = [{"role": "user", "content": "Explain AI in simple terms."}]
    response = await client.chat.completions.acreate(
        model="gemini-1.5-pro",
        messages=messages
    )
    print(response.choices[0].message.content)

asyncio.run(generate_text())
Using Gemini Flash Model

Use the 'gemini-1.5-flash' model for faster responses with slightly lower latency and cost.

python
import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

messages = [{"role": "user", "content": "Summarize the latest tech news."}]

response = client.chat.completions.create(
    model="gemini-1.5-flash",
    messages=messages
)

print(response.choices[0].message.content)

Performance

Latency~700ms for gemini-1.5-pro non-streaming calls
Cost~$0.003 per 500 tokens generated
Rate limitsTier 1: 600 RPM / 40K TPM
  • Keep prompts concise to reduce token usage.
  • Use shorter system instructions when possible.
  • Reuse context efficiently to avoid repeating tokens.
ApproachLatencyCost/callBest for
Standard call (gemini-1.5-pro)~700ms~$0.003General purpose text generation
Streaming callStarts immediately, total ~700ms~$0.003Interactive applications needing partial output
Async call~700ms~$0.003Concurrent requests in async apps
Gemini Flash model~400ms~$0.002Faster, cost-effective responses

Quick tip

Always set your API key in the environment variable and use the latest Gemini model name for best results.

Common mistake

Beginners often forget to pass the messages as a list of dictionaries with roles, causing API errors.

Verified 2026-04 · gemini-1.5-pro, gemini-1.5-flash
Verify ↗