Code beginner · 3 min read

How to use AI for data analysis in python

Q: How to use AI for data analysis in python

Use AI models like gpt-4o in Python to analyze data by sending your dataset or summary as input prompts and requesting insights, summaries, or visualizations via the chat.completions.create API.

Direct answer

Use AI models like gpt-4o in Python to analyze data by sending your dataset or summary as input prompts and requesting insights, summaries, or visualizations via the chat.completions.create API.

Setup

Install

bash

pip install openai

Env vars

OPENAI_API_KEY

Imports

python

import os
from openai import OpenAI

Examples

inAnalyze sales data trends for Q1 2026 with summary and recommendations.

outThe sales increased steadily in January and February, dipped slightly in March due to supply chain issues. Recommend focusing on inventory management and marketing in March.

inProvide descriptive statistics and identify outliers in the dataset: [12, 15, 14, 22, 13, 100, 14].

outMean is 26.29, median is 14, standard deviation is high due to outlier 100. The value 100 is an outlier and should be investigated.

inGenerate Python code to plot a histogram of customer ages from the dataset.

outimport matplotlib.pyplot as plt ages = [23, 45, 31, 35, 40, 29] plt.hist(ages, bins=5) plt.title('Customer Age Distribution') plt.show()

Integration steps

Install the OpenAI Python SDK and set your API key in the environment variable OPENAI_API_KEY.
Import the OpenAI client and initialize it with the API key from os.environ.
Prepare a prompt describing the data analysis task or provide a summary of your dataset as input.
Call the chat.completions.create method with model gpt-4o and your prompt in the messages array.
Extract the analysis or code from the response's choices[0].message.content field.
Use or display the AI-generated insights, summaries, or code for your data analysis.

Full code

python

import os
from openai import OpenAI

# Initialize client
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# Example data analysis prompt
prompt = (
    "You are a data analyst. Given the sales data for Q1 2026:"
    " January: 100, February: 120, March: 90."
    " Provide a summary of trends and recommendations."
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": prompt}]
)

analysis = response.choices[0].message.content
print("AI Data Analysis Output:\n", analysis)

output

AI Data Analysis Output:
The sales data shows an upward trend from January to February, increasing by 20%. However, there is a decline in March by 25% compared to February. This dip may indicate seasonal factors or supply chain issues. Recommendations include investigating March's drop, optimizing inventory, and boosting marketing efforts during that period.

API trace

Request

json

{"model": "gpt-4o", "messages": [{"role": "user", "content": "You are a data analyst..."}]}

Response

json

{"choices": [{"message": {"content": "The sales data shows an upward trend..."}}], "usage": {"total_tokens": 150}}

Extractresponse.choices[0].message.content

Variants

Streaming response for large data analysis ›

Use streaming when expecting long or detailed analysis to improve responsiveness and user experience.

python

import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

prompt = "Analyze the large dataset and provide insights."

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": prompt}],
    stream=True
)

for chunk in response:
    print(chunk.choices[0].delta.get('content', ''), end='')

Async version for concurrent data analysis calls ›

Use async calls to handle multiple data analysis requests concurrently for efficiency.

python

import os
import asyncio
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

async def analyze_data(prompt):
    response = await client.chat.completions.acreate(
        model="gpt-4o",
        messages=[{"role": "user", "content": prompt}]
    )
    return response.choices[0].message.content

async def main():
    prompts = [
        "Summarize sales data Q1.",
        "Identify outliers in dataset [10, 20, 30, 1000]."
    ]
    results = await asyncio.gather(*(analyze_data(p) for p in prompts))
    for r in results:
        print(r)

asyncio.run(main())

Use Claude 3.5 Sonnet for advanced coding and analysis ›

Use Claude 3.5 Sonnet when you need superior code generation and complex data analysis.

python

import os
from anthropic import Anthropic

client = Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])

prompt = "Analyze the dataset and generate Python code for visualization."

message = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=1024,
    system="You are a helpful data analyst.",
    messages=[{"role": "user", "content": prompt}]
)

print(message.content[0].text)

Performance

Latency~800ms for gpt-4o non-streaming calls

Cost~$0.002 per 500 tokens exchanged with gpt-4o

Rate limitsTier 1: 500 requests per minute / 30,000 tokens per minute

Summarize data before sending to reduce tokens.
Use concise prompts focused on specific analysis tasks.
Leverage streaming for large outputs to start processing early.

Approach	Latency	Cost/call	Best for
Standard gpt-4o call	~800ms	~$0.002	General data analysis and summaries
Streaming gpt-4o	Starts immediately, total ~1s+	~$0.002	Long or detailed analysis outputs
Claude 3.5 Sonnet	~900ms	~$0.003	Advanced coding and complex data insights

✓

Quick tip

Frame your data analysis prompt clearly with context and specific questions to get precise AI insights.

⚠

Common mistake

Beginners often send raw large datasets directly instead of summarizing or extracting key features before prompting AI.

Verified 2026-04 · gpt-4o, claude-3-5-sonnet-20241022

Verify ↗

Community Notes

No notes yetBe the first to share a version-specific fix or tip.