How to use AI for data analysis in python
Direct answer
Use AI models like
gpt-4o in Python to analyze data by sending your dataset or summary as input prompts and requesting insights, summaries, or visualizations via the chat.completions.create API.Setup
Install
pip install openai Env vars
OPENAI_API_KEY Imports
import os
from openai import OpenAI Examples
inAnalyze sales data trends for Q1 2026 with summary and recommendations.
outThe sales increased steadily in January and February, dipped slightly in March due to supply chain issues. Recommend focusing on inventory management and marketing in March.
inProvide descriptive statistics and identify outliers in the dataset: [12, 15, 14, 22, 13, 100, 14].
outMean is 26.29, median is 14, standard deviation is high due to outlier 100. The value 100 is an outlier and should be investigated.
inGenerate Python code to plot a histogram of customer ages from the dataset.
outimport matplotlib.pyplot as plt
ages = [23, 45, 31, 35, 40, 29]
plt.hist(ages, bins=5)
plt.title('Customer Age Distribution')
plt.show()
Integration steps
- Install the OpenAI Python SDK and set your API key in the environment variable OPENAI_API_KEY.
- Import the OpenAI client and initialize it with the API key from os.environ.
- Prepare a prompt describing the data analysis task or provide a summary of your dataset as input.
- Call the
chat.completions.createmethod with modelgpt-4oand your prompt in the messages array. - Extract the analysis or code from the response's
choices[0].message.contentfield. - Use or display the AI-generated insights, summaries, or code for your data analysis.
Full code
import os
from openai import OpenAI
# Initialize client
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
# Example data analysis prompt
prompt = (
"You are a data analyst. Given the sales data for Q1 2026:"
" January: 100, February: 120, March: 90."
" Provide a summary of trends and recommendations."
)
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": prompt}]
)
analysis = response.choices[0].message.content
print("AI Data Analysis Output:\n", analysis) output
AI Data Analysis Output: The sales data shows an upward trend from January to February, increasing by 20%. However, there is a decline in March by 25% compared to February. This dip may indicate seasonal factors or supply chain issues. Recommendations include investigating March's drop, optimizing inventory, and boosting marketing efforts during that period.
API trace
Request
{"model": "gpt-4o", "messages": [{"role": "user", "content": "You are a data analyst..."}]} Response
{"choices": [{"message": {"content": "The sales data shows an upward trend..."}}], "usage": {"total_tokens": 150}} Extract
response.choices[0].message.contentVariants
Streaming response for large data analysis ›
Use streaming when expecting long or detailed analysis to improve responsiveness and user experience.
import os
from openai import OpenAI
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
prompt = "Analyze the large dataset and provide insights."
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": prompt}],
stream=True
)
for chunk in response:
print(chunk.choices[0].delta.get('content', ''), end='') Async version for concurrent data analysis calls ›
Use async calls to handle multiple data analysis requests concurrently for efficiency.
import os
import asyncio
from openai import OpenAI
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
async def analyze_data(prompt):
response = await client.chat.completions.acreate(
model="gpt-4o",
messages=[{"role": "user", "content": prompt}]
)
return response.choices[0].message.content
async def main():
prompts = [
"Summarize sales data Q1.",
"Identify outliers in dataset [10, 20, 30, 1000]."
]
results = await asyncio.gather(*(analyze_data(p) for p in prompts))
for r in results:
print(r)
asyncio.run(main()) Use Claude 3.5 Sonnet for advanced coding and analysis ›
Use Claude 3.5 Sonnet when you need superior code generation and complex data analysis.
import os
from anthropic import Anthropic
client = Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])
prompt = "Analyze the dataset and generate Python code for visualization."
message = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=1024,
system="You are a helpful data analyst.",
messages=[{"role": "user", "content": prompt}]
)
print(message.content[0].text) Performance
Latency~800ms for gpt-4o non-streaming calls
Cost~$0.002 per 500 tokens exchanged with gpt-4o
Rate limitsTier 1: 500 requests per minute / 30,000 tokens per minute
- Summarize data before sending to reduce tokens.
- Use concise prompts focused on specific analysis tasks.
- Leverage streaming for large outputs to start processing early.
| Approach | Latency | Cost/call | Best for |
|---|---|---|---|
| Standard gpt-4o call | ~800ms | ~$0.002 | General data analysis and summaries |
| Streaming gpt-4o | Starts immediately, total ~1s+ | ~$0.002 | Long or detailed analysis outputs |
| Claude 3.5 Sonnet | ~900ms | ~$0.003 | Advanced coding and complex data insights |
Quick tip
Frame your data analysis prompt clearly with context and specific questions to get precise AI insights.
Common mistake
Beginners often send raw large datasets directly instead of summarizing or extracting key features before prompting AI.