Use cases: data analysis, math, plotting
Why this matters
Gemini can reason about numerical data and generate executable Python for analysis without requiring a separate data science library call: useful for exploratory analysis, quick calculations, and auto-generating visualization code from raw datasets.
Explanation
What it does: Gemini can accept raw data (CSV-like text, JSON, or plain tables) and generate analysis, perform math, or write plotting code. You send the data in your prompt and Gemini returns executable Python or mathematical answers.
How it works: The model processes your data as text, understands the structure and intent, then generates Python code (using matplotlib, seaborn, pandas) or returns calculated results. Unlike a statistics API, Gemini doesn't compute: it generates code you execute or explains math you verify. This is token-efficient for small datasets but becomes expensive for large CSVs (tokens = file size).
When to use it: Quick exploratory analysis, auto-generating boilerplate plotting code, explaining mathematical approaches, handling ad-hoc data formats the model can parse. Not for production pipelines requiring bulletproof math or handling multi-GB datasets.
Request code
import google.generativeai as genai
import os
genai.configure(api_key=os.environ['GOOGLE_API_KEY'])
model = genai.GenerativeModel('gemini-2.0-flash')
data = """Date,Sales,Region
2024-01-01,15000,North
2024-01-02,18500,South
2024-01-03,12000,East
2024-01-04,22000,West
2024-01-05,19500,North"""
prompt = f"""Analyze this sales data and generate Python code to plot daily sales by region using matplotlib. Return only executable Python code, no explanation.
{data}"""
response = model.generate_content(prompt)
print(response.text) Authentication
Set your Google API key before instantiation. Gemini reads it at model creation time: `export GOOGLE_API_KEY='your-key'` in your shell, or `os.environ['GOOGLE_API_KEY'] = 'your-key'` in Python before calling `genai.configure()`.
Response shape
| Field | Description |
|---|---|
text | string containing the generated analysis, code, or mathematical explanation |
usage_metadata | [object Object] |
finish_reason | string indicating why generation stopped (STOP, MAX_TOKENS, etc.) |
Field guide
text Your main output: for data analysis use cases, this will be executable Python code or written analysis. Always validate the code before running it.
usage_metadata Critical for cost tracking: multiply total_tokens by 0.075 (per 1M tokens for gemini-2.0-flash input) to estimate API cost. Developers often skip this and are shocked by bills when processing large datasets.
Cost
A 10,000-row CSV pasted into a prompt costs ~2000 tokens (roughly $0.15). A 1M-row CSV costs ~200,000 tokens (~$15). Use google.generativeai's File API for large datasets instead of embedding them in prompts.
Rate limits
Gemini API defaults to 60 requests/minute for free tier. Data analysis workflows that generate code then re-run it for verification can hit this quickly. Add exponential backoff: catch `google.api_core.exceptions.ResourceExhausted` and retry after 2^attempt seconds.
Common gotcha
Asking Gemini to 'analyze this data' without specifying output format results in explanatory text, not code. Add 'return only Python code' or 'return a JSON summary' to control the response type. Also, pasting large CSVs (>10K rows) into prompts burns tokens fast: for production, use the File API instead.
Error recovery
google.api_core.exceptions.InvalidArgumentgoogle.api_core.exceptions.ResourceExhaustedgoogle.api_core.exceptions.PermissionDeniedValueError in generated codeExperienced dev note
If you're using Gemini for data analysis, separate the analysis request from code generation. Ask Gemini to 'describe what plot would show sales trends' first, then in a follow-up request 'generate the code.' This two-step approach costs slightly more in tokens but produces higher-quality, more predictable code because the model committed to an approach before writing it. Also: always ask for output in a format you can parse: 'return valid JSON' is safer than 'explain the results' because JSON is machine-readable and you can version it.
Check your understanding
You send a 50,000-row CSV to Gemini in a prompt asking it to 'find trends and plot them.' The API charges you based on prompt tokens, not the quality or usefulness of the analysis. How would you restructure this to reduce costs while keeping the same analysis quality?
Show answer hint
Token cost is proportional to data size. Use the File API to cache the dataset (one-time token cost, then reused), or pre-summarize the data locally before sending it to Gemini. Asking Gemini to 'find trends' also uses more reasoning tokens than asking 'plot sales by month': specificity reduces cost and improves output.