API Advanced hard · 8 min

Training data format for Gemini

What you will learn

Gemini does not support fine-tuning via training data uploads; use the File API to pass structured documents for in-context retrieval instead.

Why this matters

Developers migrating from OpenAI's fine-tuning assume Gemini offers equivalent model adaptation. Understanding Gemini's actual training data constraints prevents wasted development effort and helps you choose the right pattern (retrieval vs. few-shot vs. system prompts) for your use case.

Skip if: Use this approach when you need true model weight updates: Gemini does not support fine-tuning. If you need persistent model customization, use Claude via Anthropic's API or OpenAI's fine-tuning. If your data fits in a single prompt, use few-shot examples in the system prompt instead of uploading files.

Explanation

Gemini does not support fine-tuning in the traditional sense (training new model weights on your data). Instead, Gemini provides two mechanisms for data injection: the File API for uploading documents that persist across sessions, and in-context learning (few-shot examples in system prompts). The File API accepts MIME types including text/plain, application/pdf, and structured formats; files are indexed server-side and can be referenced across multiple API calls within a session or across sessions if you retain the file ID.

Under the hood, when you upload a file via the File API, Google's infrastructure chunks and embeds the content, making it retrievable via semantic search during generation. The model doesn't learn from this data: it retrieves relevant sections at inference time. This is fundamentally different from fine-tuning, where model parameters are updated. For truly custom model behavior, you should use few-shot examples in the system_instruction field or include examples in your user prompt, which fire-and-forget but work within context-window limits.

Use the File API when your reference data is large (>5KB), changes infrequently, and you want to reuse it across multiple prompts without re-uploading. Use in-context examples when your data is small, changes per-request, or you need guaranteed retrieval precision. Never expect Gemini to adapt its base weights: it is a frozen model; your data influences only retrieval and prompt context, not model behavior.

Request code

python

import os
import google.generativeai as genai
from pathlib import Path

genai.configure(api_key=os.environ['GOOGLE_API_KEY'])

with open('sample_doc.txt', 'w') as f:
    f.write('Acme Corp raised $50M in Series B. Founded 2020. Headquarters: San Francisco.\nKey products: Widget X, Widget Y.')

file_path = Path('sample_doc.txt')
upload_response = genai.upload_file(
    path=file_path,
    mime_type='text/plain'
)

print(f'File uploaded with URI: {upload_response.uri}')
print(f'File name: {upload_response.name}')

model = genai.GenerativeModel('gemini-2.0-flash')
response = model.generate_content([
    'Based on the uploaded document, what was Acme Corp\'s funding amount?',
    upload_response
])

print(f'Model response: {response.text}')

genai.delete_file(name=upload_response.name)
print('File deleted.')

Authentication

Set your Google API key in the environment: export GOOGLE_API_KEY='your-key-here'. The google-generativeai library reads this at configuration time. Verify access by calling genai.configure(api_key=os.environ['GOOGLE_API_KEY']) before uploading files.

Response shape

Field	Description
`upload_response`	[object Object]
`generate_response`	[object Object]

Field guide

name

Use this to reference the file in subsequent API calls within 48 hours. If you need persistence beyond 48 hours, store this value in your database.

expiration_time

Files auto-delete after 48 hours by default. The hidden capability: you can call genai.update_file() to extend expiration, but this is undocumented and may not be available on all tier levels.

usage_metadata

Track prompt_token_count carefully: large files with semantic retrieval may inflate token usage significantly. A 100KB document counted character-by-character can cost 20K–50K tokens depending on retrieval efficiency.

uri

This is a storage identifier, not a download URL. You cannot share this with users or embed it in client code; the URI is tied to your API key's permissions.

Setup trap

The google-generativeai library uses lazy authentication. If you set os.environ['GOOGLE_API_KEY'] after importing genai but before calling genai.configure(), the configuration will succeed, but subsequent file upload calls may fail silently if the key was invalid. Always validate with a small API call immediately after configure(): e.g., genai.list_files(): to catch credential issues early.

Cost

File uploads are free, but tokens consumed during retrieval are charged. A 50MB PDF file may consume 50K–200K tokens on first retrieval depending on query complexity and indexing depth. Each subsequent query re-consumes tokens proportional to the retrieved context size, not the full file size. Budget accordingly: if you retrieve from 10 large files per day, you may consume 500K–1M tokens/day in retrieval alone.

Rate limits

File uploads are subject to a 100 files per 60 seconds rate limit per API key. If you bulk-upload reference documents, implement exponential backoff and batch in groups of 5 with 1-second delays. Files created within a single session share quota; aggressive cleanup (delete_file) after each session frees quota for new uploads.

Common gotcha

Developers assume file uploads persist indefinitely or that files uploaded in one session are available in the next day's session without re-upload. Files expire after 48 hours by default. If you need persistent reference data, either re-upload before each session or use a vector database (Firestore, Pinecone) alongside Gemini. Additionally, semantic retrieval from large files is NOT instantaneous: the first query involving a new file may have latency (100–500ms) while indexing completes server-side.

Error recovery

google.generativeai.types.BiddingStrategyError (API key invalid)

Your GOOGLE_API_KEY environment variable is unset or malformed. Verify with: echo $GOOGLE_API_KEY. Regenerate the key in Google AI Studio and re-export.

File size exceeds limit (413 Payload Too Large)

Gemini File API has a 2GB single-file limit. Split large files (PDFs >500MB) into chunks and upload separately, then query each file's URI in parallel.

File not found (404) on second session

Your 48-hour window expired. Re-upload the file or implement persistent storage by saving the file URI to your database and validating expiration_time before using it.

Retrieval returned no context (empty candidates)

The uploaded file's content does not match your query semantically. Verify the file uploaded correctly with genai.list_files() and inspect file properties. If content is sparse, add clarifying keywords to your query or re-upload with denser structured data.

Token count unexpectedly high

Semantic retrieval indexed more content than expected. Large files with high term overlap may cause the model to retrieve broad context sections. Reduce file scope or reformulate queries with specific entity names or structured markers (e.g., 'In the COMPANY_OVERVIEW section, what is...').

Experienced dev note

Gemini's File API is a retrieval layer, not a training interface: it's fundamentally different from OpenAI fine-tuning or Anthropic's context windows. The real power is not in the 48-hour expiration or file size limits, but in the fact that token costs scale with retrieved context, not uploaded size. A 1GB file costs zero to store but can cost 100K+ tokens if fully retrieved in a single query. Experienced teams optimize by: (1) breaking reference data into modular documents (one per entity/topic), (2) using specific queries to constrain retrieval to 5–10KB chunks, (3) caching file URIs in Redis/Firestore with expiration timestamps, and (4) pre-computing embeddings of file summaries to filter which files to query. This prevents 'token bloat' where a single broad query retrieves 500KB of irrelevant context.

Check your understanding

You have a 2GB product catalog and need to answer customer questions about product specifications. Why shouldn't you upload the entire catalog as a single file, and what's the token-cost implication if you do?

Show answer hint

Semantic retrieval will pull broad context sections when the query is vague. A query like 'Do you have widgets?' against a single 2GB file might retrieve 10MB+ of context (every product entry with 'widget' anywhere). Split into per-category files and use specific queries to constrain retrieval to <100KB per query, reducing token spend by 50–90x.

VERSION google-generativeai 0.8.x uses genai.upload_file() and genai.delete_file(). Versions 0.6.x and earlier used different function signatures. Ensure you're on 0.8.x or later: pip install --upgrade google-generativeai. The File API is stable in 0.8.x with no breaking changes planned for the next 6 months (as of April 2026).

Community Notes

No notes yetBe the first to share a version-specific fix or tip.