How to beginner · 3 min read

How to test prompts systematically

Quick answer
Use Python scripts with the OpenAI SDK to automate prompt testing by sending multiple prompt variations to models like gpt-4o. Collect and compare outputs programmatically to evaluate prompt effectiveness and consistency.

PREREQUISITES

  • Python 3.8+
  • OpenAI API key (free tier works)
  • pip install openai>=1.0

Setup

Install the openai Python package and set your API key as an environment variable for secure access.

bash
pip install openai>=1.0
output
Collecting openai
  Downloading openai-1.x.x-py3-none-any.whl (xx kB)
Installing collected packages: openai
Successfully installed openai-1.x.x

Step by step

This example demonstrates how to systematically test multiple prompt variations against the gpt-4o model using the OpenAI SDK. It sends each prompt, collects responses, and prints them for comparison.

python
import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

prompts = [
    "Explain the benefits of renewable energy.",
    "List three advantages of renewable energy.",
    "Why is renewable energy important for the environment?"
]

for i, prompt in enumerate(prompts, 1):
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": prompt}]
    )
    text = response.choices[0].message.content
    print(f"Prompt {i}: {prompt}\nResponse:\n{text}\n{'-'*40}")
output
Prompt 1: Explain the benefits of renewable energy.
Response:
Renewable energy reduces greenhouse gas emissions, decreases dependence on fossil fuels, and promotes sustainable development.
----------------------------------------
Prompt 2: List three advantages of renewable energy.
Response:
1. Reduces carbon footprint
2. Provides sustainable power
3. Lowers energy costs over time
----------------------------------------
Prompt 3: Why is renewable energy important for the environment?
Response:
It helps combat climate change by reducing pollution and conserving natural resources.
----------------------------------------

Common variations

You can extend prompt testing by using asynchronous calls for faster batch processing, testing different models like gpt-4o-mini, or enabling streaming to monitor partial outputs in real time.

python
import asyncio
import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

async def test_prompts_async(prompts):
    tasks = []
    for prompt in prompts:
        tasks.append(
            client.chat.completions.acreate(
                model="gpt-4o-mini",
                messages=[{"role": "user", "content": prompt}]
            )
        )
    responses = await asyncio.gather(*tasks)
    for i, response in enumerate(responses, 1):
        print(f"Async Prompt {i}: {prompts[i-1]}\nResponse:\n{response.choices[0].message.content}\n{'-'*40}")

prompts = [
    "What are the health benefits of meditation?",
    "Explain meditation benefits in three points."
]

asyncio.run(test_prompts_async(prompts))
output
Async Prompt 1: What are the health benefits of meditation?
Response:
Meditation reduces stress, improves concentration, and enhances emotional health.
----------------------------------------
Async Prompt 2: Explain meditation benefits in three points.
Response:
1. Lowers anxiety
2. Boosts focus
3. Promotes emotional well-being
----------------------------------------

Troubleshooting

  • If you receive authentication errors, verify your OPENAI_API_KEY environment variable is set correctly.
  • For rate limit errors, add delays between requests or reduce batch size.
  • If outputs are inconsistent, ensure prompt formatting is consistent and consider using temperature=0 for deterministic responses.

Key Takeaways

  • Automate prompt testing by scripting multiple prompt calls with the OpenAI SDK.
  • Use asynchronous calls to speed up batch prompt evaluations.
  • Compare outputs side-by-side to identify the most effective prompt formulations.
  • Set temperature=0 for consistent, deterministic responses during testing.
  • Handle API errors by checking environment variables and managing request rates.
Verified 2026-04 · gpt-4o, gpt-4o-mini
Verify ↗