How to use Gemini for document analysis
Quick answer
Use the
gemini-1.5-pro or gemini-2.0-flash model via the Google Gemini API to analyze documents by sending the document text as input in a chat completion request. The API returns structured insights or summaries based on the document content.PREREQUISITES
Python 3.8+Google Cloud account with Gemini API accessSet environment variable GOOGLE_API_KEY with your API keypip install google-ai gemini-sdk (or equivalent Google Gemini client)
Setup
Install the Google Gemini SDK and configure your environment with your API key.
- Install the SDK:
pip install google-ai gemini-sdk - Set your API key in the environment:
export GOOGLE_API_KEY='your_api_key_here'(Linux/macOS) orset GOOGLE_API_KEY=your_api_key_here(Windows)
pip install google-ai gemini-sdk Step by step
Use the Gemini API to analyze a document by sending its text to the gemini-1.5-pro model and receive a structured summary or insights.
import os
from google.ai import gemini_v1
# Initialize client with API key from environment
client = gemini_v1.GeminiClient(api_key=os.environ["GOOGLE_API_KEY"])
document_text = """\
Artificial intelligence (AI) is transforming industries by enabling new capabilities in automation, data analysis, and decision-making. Document analysis helps extract key information efficiently.
"""
response = client.chat.completions.create(
model="gemini-1.5-pro",
messages=[{"role": "user", "content": f"Analyze this document:\n{document_text}"}]
)
print("Document analysis result:")
print(response.choices[0].message.content) output
Document analysis result: The document highlights AI's impact on industries, focusing on automation, data analysis, and decision-making improvements through document analysis.
Common variations
You can use different Gemini models like gemini-2.0-flash for faster responses or enable streaming for real-time output. Async calls are supported in the SDK for scalable applications.
import asyncio
from google.ai import gemini_v1
async def analyze_document_async(text):
client = gemini_v1.GeminiClient(api_key=os.environ["GOOGLE_API_KEY"])
response = await client.chat.completions.acreate(
model="gemini-2.0-flash",
messages=[{"role": "user", "content": f"Analyze this document:\n{text}"}]
)
print("Async analysis result:")
print(response.choices[0].message.content)
asyncio.run(analyze_document_async("AI is revolutionizing document processing.")) output
Async analysis result: AI is revolutionizing document processing by enabling faster and more accurate extraction of key information.
Troubleshooting
- If you get authentication errors, verify your
GOOGLE_API_KEYenvironment variable is set correctly. - For rate limit errors, implement exponential backoff retries.
- If the model returns incomplete analysis, try increasing
max_tokensor use a more capable model likegemini-1.5-pro.
Key Takeaways
- Use the
gemini-1.5-promodel for detailed document analysis. - Always set your API key securely via environment variables.
- Async and streaming calls improve performance for large-scale document processing.