How to generate text with Gemini API in python
Direct answer
Use the Gemini API by importing the OpenAI client from
openai, initializing it with your API key from os.environ, and calling client.chat.completions.create with model gemini-1.5-pro and your messages.Setup
Install
pip install openai Env vars
OPENAI_API_KEY Imports
import os
from openai import OpenAI Examples
inHello, how are you?
outI'm doing great, thank you! How can I assist you today?
inWrite a short poem about spring.
outSpring whispers softly, blooms awake anew,
Colors dance in sunlight, skies painted blue.
inExplain quantum computing in simple terms.
outQuantum computing uses tiny particles that can be in many states at once, helping solve complex problems faster than regular computers.
Integration steps
- Install the OpenAI Python SDK and set your API key in the environment variable OPENAI_API_KEY.
- Import the OpenAI client and initialize it with your API key from os.environ.
- Create a messages list with a user role and your prompt content.
- Call client.chat.completions.create with model 'gemini-1.5-pro' and the messages list.
- Extract the generated text from response.choices[0].message.content.
- Use or display the generated text as needed.
Full code
import os
from openai import OpenAI
# Initialize client with API key from environment
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
# Define the prompt message
messages = [{"role": "user", "content": "Write a short story about a robot learning to love."}]
# Call the Gemini model for text generation
response = client.chat.completions.create(
model="gemini-1.5-pro",
messages=messages
)
# Extract and print the generated text
text = response.choices[0].message.content
print("Generated Text:\n", text) output
Generated Text: Once upon a time, in a world of circuits and code, a robot named Aria discovered the warmth of friendship and the beauty of love...
API trace
Request
{"model": "gemini-1.5-pro", "messages": [{"role": "user", "content": "Write a short story about a robot learning to love."}]} Response
{"choices": [{"message": {"content": "Once upon a time, in a world of circuits and code..."}}], "usage": {"total_tokens": 150}} Extract
response.choices[0].message.contentVariants
Streaming Text Generation ›
Use streaming when you want to display the text as it is generated for a more interactive user experience.
import os
from openai import OpenAI
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
messages = [{"role": "user", "content": "Tell me a joke."}]
# Stream the response for better UX
response = client.chat.completions.create(
model="gemini-1.5-pro",
messages=messages,
stream=True
)
for chunk in response:
print(chunk.choices[0].delta.get('content', ''), end='')
print() Async Text Generation ›
Use async calls to handle multiple concurrent requests efficiently in asynchronous Python applications.
import os
import asyncio
from openai import OpenAI
async def generate_text():
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
messages = [{"role": "user", "content": "Explain AI in simple terms."}]
response = await client.chat.completions.acreate(
model="gemini-1.5-pro",
messages=messages
)
print(response.choices[0].message.content)
asyncio.run(generate_text()) Using Gemini Flash Model ›
Use the 'gemini-1.5-flash' model for faster responses with slightly lower latency and cost.
import os
from openai import OpenAI
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
messages = [{"role": "user", "content": "Summarize the latest tech news."}]
response = client.chat.completions.create(
model="gemini-1.5-flash",
messages=messages
)
print(response.choices[0].message.content) Performance
Latency~700ms for gemini-1.5-pro non-streaming calls
Cost~$0.003 per 500 tokens generated
Rate limitsTier 1: 600 RPM / 40K TPM
- Keep prompts concise to reduce token usage.
- Use shorter system instructions when possible.
- Reuse context efficiently to avoid repeating tokens.
| Approach | Latency | Cost/call | Best for |
|---|---|---|---|
| Standard call (gemini-1.5-pro) | ~700ms | ~$0.003 | General purpose text generation |
| Streaming call | Starts immediately, total ~700ms | ~$0.003 | Interactive applications needing partial output |
| Async call | ~700ms | ~$0.003 | Concurrent requests in async apps |
| Gemini Flash model | ~400ms | ~$0.002 | Faster, cost-effective responses |
Quick tip
Always set your API key in the environment variable and use the latest Gemini model name for best results.
Common mistake
Beginners often forget to pass the messages as a list of dictionaries with roles, causing API errors.