API Intermediate medium · 6 min

model.start_chat(): conversation object

What you will learn
Start a multi-turn conversation with the Gemini API using <code>model.start_chat()</code> to maintain context across message exchanges.

Why this matters

Real applications need conversational memory: a chatbot, debugging assistant, or code review tool can't restart from scratch for every user message. <code>start_chat()</code> handles context and turn order automatically, eliminating the manual work of appending/managing message history.

Skip if: Use <code>model.generate_content()</code> directly when you have a single prompt with no follow-up questions, or when building stateless request-response pipelines where context persistence creates unwanted coupling. For truly ephemeral interactions (one-shot summaries, batch processing), the overhead of conversation objects is unnecessary.

Explanation

What it does: model.start_chat() returns a ChatSession object that maintains conversation history and role tracking (user/model). Each call to send_message() automatically appends your message and the model's response to an internal history list, so the next turn sees the full context.

How it works: The chat object stores a list of Content objects representing the conversation thread. When you call send_message(prompt), the SDK packages your message and the entire history into a single API request to Gemini. The model sees the conversation arc, not isolated prompts. Response messages are automatically added to history for the next turn.

When to use it: Use this for any interactive experience: user-facing chatbots, multi-step problem solving, iterative refinement workflows, or debugging conversations where context from earlier turns directly influences the next response. It's also the idiomatic way to build conversation in Gemini unlike single-call APIs where you'd manually manage message lists.

Request code

python
import google.generativeai as genai
import os

genai.configure(api_key=os.environ['GOOGLE_API_KEY'])

model = genai.GenerativeModel('gemini-2.0-flash')
chat = model.start_chat(history=[])

response1 = chat.send_message('Explain quantum entanglement in one sentence')
print('Assistant:', response1.text)
print('Message count:', len(chat.history))

response2 = chat.send_message('Now explain it to a five-year-old')
print('Assistant:', response2.text)
print('Message count:', len(chat.history))

response3 = chat.send_message('What was the first thing I asked you?')
print('Assistant:', response3.text)

Authentication

Ensure your API key is set before instantiating the model. The SDK reads GOOGLE_API_KEY from environment or accepts it via genai.configure(api_key='your-key'). The conversation object inherits this authentication: no additional setup per chat.

Response shape

FieldDescription
text The assistant's response as a plain string
parts List of Content.Part objects (usually contains one text part)
finish_reason Enum indicating why generation stopped (STOP, MAX_TOKENS, etc.)
usage_metadata Object with prompt_token_count, candidates_token_count, total_token_count

Field guide

text

The primary field you'll use 99% of the time: the actual model response as a string

parts

Lower-level access to response components; useful if you need to inspect token counts or check for specific content types before accessing .text

finish_reason

Often overlooked but critical in production: <code>STOP</code> means normal completion, but <code>MAX_TOKENS</code> means the response was truncated: your answer is incomplete and should be handled differently

usage_metadata

Developers commonly ignore this, but it's the only way to track token consumption per turn without external logging: essential for cost monitoring in high-volume systems

Setup trap

Passing a non-empty history list to start_chat() is a subtle footgun. The history must alternate perfectly: user, model, user, model. If you pass history with consecutive user messages or incorrect role tags, the API silently accepts it but treats the conversation as malformed on the next send_message(): producing errors that blame your prompt, not the history structure.

Cost

Each <code>send_message()</code> includes the <em>entire conversation history</em> in the request, not just your new message. A 10-turn conversation where each user message is 100 tokens and each response is 200 tokens will cost you tokens for all prior messages again on turn 11. For long conversations, consider summarizing history or using separate chat sessions to avoid token explosion.

Rate limits

Rapid <code>send_message()</code> calls (e.g., loop of 10 messages in 2 seconds) will hit rate limits before single-call APIs. The Gemini API allows ~10 requests per minute for free tier. Long conversations with human delays won't trigger this, but automated multi-turn workflows need exponential backoff.

Common gotcha

Modifying the chat.history list directly after send_message() feels intuitive but breaks turn order. Each new send_message() expects history in the exact structure the API maintains. If you append, remove, or reorder messages manually, the next API call may fail with a 'malformed conversation' error or produce incoherent responses because turn roles are scrambled.

Error recovery

InvalidArgument: INVALID_ARGUMENT
History contains misaligned roles or malformed Content.Part objects. Verify every even-indexed message is user role and odd-indexed is model role. Use <code>print(chat.history)</code> to inspect structure before next send_message().
ResourceExhausted
Rate limit hit. Add exponential backoff: <code>import time; time.sleep(2 ** attempt)</code> between retries, capping at 60 seconds.
DeadlineExceeded
Request timeout, usually due to very long history (1000+ tokens) or model overload. Try shortening history or use a faster model like gemini-2.0-flash instead of gemini-2.5-pro.
Unauthenticated
API key not set or expired. Verify <code>os.environ['GOOGLE_API_KEY']</code> exists and <code>genai.configure()</code> was called before <code>start_chat()</code>.

Experienced dev note

The ChatSession object's automatic history management is a gift and a trap. Gift: you never manually append/format messages. Trap: the history is mutable and kept in memory: long-running applications leak memory if you spawn unbounded chats. In production, either bound chat lifetime (max 50 turns per chat, then restart) or externalize history to a database and rebuild the ChatSession periodically. Also, history is not persisted across process restarts: if you need conversation recovery, export chat.history to JSON before shutting down, then rebuild via model.start_chat(history=loaded_history).

Check your understanding

You're building a customer support chatbot. A user sends 5 messages over 30 minutes, then your process crashes and restarts. The next message from the user produces a response that ignores context from their first 3 messages. Why, and what's the fix?

Show answer hint

The crash destroyed the in-memory ChatSession object. History is not persisted by the API itself: you must save it before shutdown and reload it into a new ChatSession. The Gemini API has no server-side session storage like traditional chatbots.

VERSION google-generativeai 0.8.x uses ChatSession.send_message(). Earlier 0.1.x versions used deprecated ChatSession.send_messages() (plural): upgrade to avoid maintenance debt. The history structure and role enumeration are stable across 0.8.x.

Community Notes

No notes yetBe the first to share a version-specific fix or tip.