Code beginner · 3 min read

How to use AI for data analysis in python

Direct answer
Use AI models like gpt-4o in Python to analyze data by sending your dataset or summary as input prompts and requesting insights, summaries, or visualizations via the chat.completions.create API.

Setup

Install
bash
pip install openai
Env vars
OPENAI_API_KEY
Imports
python
import os
from openai import OpenAI

Examples

inAnalyze sales data trends for Q1 2026 with summary and recommendations.
outThe sales increased steadily in January and February, dipped slightly in March due to supply chain issues. Recommend focusing on inventory management and marketing in March.
inProvide descriptive statistics and identify outliers in the dataset: [12, 15, 14, 22, 13, 100, 14].
outMean is 26.29, median is 14, standard deviation is high due to outlier 100. The value 100 is an outlier and should be investigated.
inGenerate Python code to plot a histogram of customer ages from the dataset.
outimport matplotlib.pyplot as plt ages = [23, 45, 31, 35, 40, 29] plt.hist(ages, bins=5) plt.title('Customer Age Distribution') plt.show()

Integration steps

  1. Install the OpenAI Python SDK and set your API key in the environment variable OPENAI_API_KEY.
  2. Import the OpenAI client and initialize it with the API key from os.environ.
  3. Prepare a prompt describing the data analysis task or provide a summary of your dataset as input.
  4. Call the chat.completions.create method with model gpt-4o and your prompt in the messages array.
  5. Extract the analysis or code from the response's choices[0].message.content field.
  6. Use or display the AI-generated insights, summaries, or code for your data analysis.

Full code

python
import os
from openai import OpenAI

# Initialize client
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# Example data analysis prompt
prompt = (
    "You are a data analyst. Given the sales data for Q1 2026:"
    " January: 100, February: 120, March: 90."
    " Provide a summary of trends and recommendations."
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": prompt}]
)

analysis = response.choices[0].message.content
print("AI Data Analysis Output:\n", analysis)
output
AI Data Analysis Output:
The sales data shows an upward trend from January to February, increasing by 20%. However, there is a decline in March by 25% compared to February. This dip may indicate seasonal factors or supply chain issues. Recommendations include investigating March's drop, optimizing inventory, and boosting marketing efforts during that period.

API trace

Request
json
{"model": "gpt-4o", "messages": [{"role": "user", "content": "You are a data analyst..."}]}
Response
json
{"choices": [{"message": {"content": "The sales data shows an upward trend..."}}], "usage": {"total_tokens": 150}}
Extractresponse.choices[0].message.content

Variants

Streaming response for large data analysis

Use streaming when expecting long or detailed analysis to improve responsiveness and user experience.

python
import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

prompt = "Analyze the large dataset and provide insights."

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": prompt}],
    stream=True
)

for chunk in response:
    print(chunk.choices[0].delta.get('content', ''), end='')
Async version for concurrent data analysis calls

Use async calls to handle multiple data analysis requests concurrently for efficiency.

python
import os
import asyncio
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

async def analyze_data(prompt):
    response = await client.chat.completions.acreate(
        model="gpt-4o",
        messages=[{"role": "user", "content": prompt}]
    )
    return response.choices[0].message.content

async def main():
    prompts = [
        "Summarize sales data Q1.",
        "Identify outliers in dataset [10, 20, 30, 1000]."
    ]
    results = await asyncio.gather(*(analyze_data(p) for p in prompts))
    for r in results:
        print(r)

asyncio.run(main())
Use Claude 3.5 Sonnet for advanced coding and analysis

Use Claude 3.5 Sonnet when you need superior code generation and complex data analysis.

python
import os
from anthropic import Anthropic

client = Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])

prompt = "Analyze the dataset and generate Python code for visualization."

message = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=1024,
    system="You are a helpful data analyst.",
    messages=[{"role": "user", "content": prompt}]
)

print(message.content[0].text)

Performance

Latency~800ms for gpt-4o non-streaming calls
Cost~$0.002 per 500 tokens exchanged with gpt-4o
Rate limitsTier 1: 500 requests per minute / 30,000 tokens per minute
  • Summarize data before sending to reduce tokens.
  • Use concise prompts focused on specific analysis tasks.
  • Leverage streaming for large outputs to start processing early.
ApproachLatencyCost/callBest for
Standard gpt-4o call~800ms~$0.002General data analysis and summaries
Streaming gpt-4oStarts immediately, total ~1s+~$0.002Long or detailed analysis outputs
Claude 3.5 Sonnet~900ms~$0.003Advanced coding and complex data insights

Quick tip

Frame your data analysis prompt clearly with context and specific questions to get precise AI insights.

Common mistake

Beginners often send raw large datasets directly instead of summarizing or extracting key features before prompting AI.

Verified 2026-04 · gpt-4o, claude-3-5-sonnet-20241022
Verify ↗