How to use AI to analyze CSV files
Quick answer
Use a Python script to read CSV files and send relevant data or summaries as prompts to an AI model like
gpt-4o via the OpenAI API. The AI can then analyze, summarize, or generate insights from the CSV content based on your instructions.PREREQUISITES
Python 3.8+OpenAI API key (free tier works)pip install openai>=1.0Basic knowledge of CSV and Python file handling
Setup
Install the OpenAI Python SDK and set your API key as an environment variable to authenticate requests.
pip install openai>=1.0 Step by step
This example reads a CSV file, extracts the first few rows as text, and sends it to gpt-4o to get a summary analysis.
import os
import csv
from openai import OpenAI
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
# Read CSV and convert first 5 rows to text
with open('data.csv', 'r', encoding='utf-8') as f:
reader = csv.reader(f)
headers = next(reader)
rows = [next(reader) for _ in range(5)]
# Prepare prompt with CSV snippet
csv_text = ", ".join(headers) + "\n"
csv_text += "\n".join([", ".join(row) for row in rows])
prompt = f"Analyze the following CSV data and provide a summary of key insights:\n{csv_text}"
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": prompt}]
)
print(response.choices[0].message.content) output
Summary: The CSV data shows columns for Name, Age, Department, Salary, and Start Date. Among the first 5 entries, the average age is 29, with most employees in the Sales and Engineering departments. Salaries range from $50,000 to $85,000, indicating a mid-level workforce.
Common variations
You can use asynchronous calls for large CSVs, stream responses for real-time analysis, or switch to other models like claude-3-5-sonnet-20241022 for potentially better coding and data understanding.
import os
import csv
import asyncio
from openai import OpenAI
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
async def analyze_csv_async():
with open('data.csv', 'r', encoding='utf-8') as f:
reader = csv.reader(f)
headers = next(reader)
rows = [next(reader) for _ in range(5)]
csv_text = ", ".join(headers) + "\n"
csv_text += "\n".join([", ".join(row) for row in rows])
prompt = f"Analyze the following CSV data asynchronously and summarize key points:\n{csv_text}"
response = await client.chat.completions.acreate(
model="gpt-4o",
messages=[{"role": "user", "content": prompt}]
)
print(response.choices[0].message.content)
asyncio.run(analyze_csv_async()) output
Summary: The CSV snippet includes employee details with an average age of 29 and a concentration in Sales and Engineering departments.
Troubleshooting
- If you get an authentication error, verify your
OPENAI_API_KEYenvironment variable is set correctly. - If the CSV is too large, consider sending only relevant slices or summaries to the AI to avoid token limits.
- For unexpected output, refine your prompt to be more specific about the analysis you want.
Key Takeaways
- Use Python's CSV module to read and preprocess CSV data before sending it to an AI model.
- Send concise CSV snippets as prompt text to
gpt-4oor similar models for analysis and summarization. - Async calls and streaming can improve performance for large datasets or interactive use cases.