How to use ChatGPT advanced data analysis
Quick answer
Use
gpt-4o or similar ChatGPT models with advanced data analysis by sending your data and analysis instructions via the OpenAI API. Process data by embedding it in prompts or uploading files, then request computations, visualizations, or summaries directly from the model.PREREQUISITES
Python 3.8+OpenAI API key (free tier works)pip install openai>=1.0Basic knowledge of Python data libraries (pandas, matplotlib)
Setup
Install the OpenAI Python SDK and set your API key as an environment variable to authenticate requests.
pip install openai Step by step
This example shows how to load a CSV file, send data and a prompt to gpt-4o for analysis, and print the model's response.
import os
import pandas as pd
from openai import OpenAI
# Load data
data = pd.DataFrame({
'Month': ['Jan', 'Feb', 'Mar', 'Apr'],
'Sales': [150, 200, 170, 220]
})
# Convert data to CSV string
csv_data = data.to_csv(index=False)
# Initialize OpenAI client
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
# Compose prompt with data
prompt = f"Analyze the following sales data and provide insights:\n{csv_data}\nSummary:"
# Call ChatGPT advanced data analysis model
tokens_limit = 1000
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": prompt}],
max_tokens=tokens_limit
)
print(response.choices[0].message.content) output
The sales data shows a steady increase from January to April, with February and April having the highest sales. Consider focusing marketing efforts in these months to maximize revenue.
Common variations
- Use
gpt-4o-minifor faster, lower-cost analysis with smaller data. - Stream responses by setting
stream=Truein the API call for real-time output. - Use async calls with Python
asynciofor concurrent data analysis tasks.
import asyncio
import os
from openai import OpenAI
async def analyze_data_async():
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
prompt = "Analyze the sales data: Month: Jan, Feb, Mar; Sales: 100, 150, 130. Provide a summary."
response = await client.chat.completions.acreate(
model="gpt-4o-mini",
messages=[{"role": "user", "content": prompt}],
max_tokens=500
)
print(response.choices[0].message.content)
asyncio.run(analyze_data_async()) output
Sales increased from January to February but dipped slightly in March, indicating a need to investigate March's sales drop.
Troubleshooting
- If you receive
RateLimitError, reduce request frequency or upgrade your plan. - If the model output is incomplete, increase
max_tokensor use streaming. - For JSON or structured output, specify the format clearly in your prompt.
Key Takeaways
- Use
gpt-4owith data embedded in prompts for advanced analysis. - Convert data to CSV or JSON strings for easy ingestion by the model.
- Leverage async and streaming for efficient, real-time data workflows.