How to use AI to write pandas code
Quick answer
Use AI models like
gpt-4o to generate pandas code by providing clear prompts describing your data tasks. Call the chat.completions.create API with your prompt, and the model returns Python code snippets using pandas for data manipulation or analysis.PREREQUISITES
Python 3.8+OpenAI API key (free tier works)pip install openai>=1.0Basic knowledge of pandas library
Setup
Install the OpenAI Python SDK and set your API key as an environment variable to authenticate requests.
pip install openai>=1.0 Step by step
Use the OpenAI gpt-4o model to generate pandas code by sending a prompt describing your data task. The model returns Python code you can run directly.
import os
from openai import OpenAI
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
prompt = (
"Write pandas code to load a CSV file named 'data.csv', "
"filter rows where the 'age' column is greater than 30, "
"and calculate the average salary."
)
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": prompt}]
)
code_snippet = response.choices[0].message.content
print("Generated pandas code:\n", code_snippet) output
Generated pandas code:
import pandas as pd
df = pd.read_csv('data.csv')
filtered = df[df['age'] > 30]
avg_salary = filtered['salary'].mean()
print(f"Average salary for age > 30: {avg_salary}") Common variations
You can use async calls, stream responses for large outputs, or switch models like claude-3-5-sonnet-20241022 for better code generation. Adjust prompts to specify output format or add comments.
import os
import asyncio
from openai import OpenAI
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
async def generate_pandas_code():
prompt = "Write pandas code to group data by 'department' and sum 'sales'."
response = await client.chat.completions.acreate(
model="gpt-4o",
messages=[{"role": "user", "content": prompt}]
)
print(response.choices[0].message.content)
asyncio.run(generate_pandas_code()) output
import pandas as pd
df = pd.read_csv('data.csv')
grouped = df.groupby('department')['sales'].sum()
print(grouped) Troubleshooting
- If the generated code has syntax errors, clarify your prompt to ask for runnable Python code.
- If the output is incomplete, use streaming or increase
max_tokens. - For authentication errors, verify your API key is set correctly in
os.environ.
Key Takeaways
- Use clear, specific prompts to get accurate pandas code from AI models.
- The OpenAI
gpt-4omodel reliably generates runnable pandas snippets. - Async and streaming calls help handle longer or more complex code outputs.
- Always keep your API key secure and set via environment variables.
- Refine prompts iteratively to improve code quality and relevance.