How to Intermediate · 4 min read

How to use wandb for prompt optimization

Quick answer

Use wandb to track and log your prompt optimization experiments by initializing a wandb run, logging prompts and model outputs as metrics or artifacts, and comparing results across runs. This enables systematic prompt tuning and visualization of performance improvements.

PREREQUISITES

Python 3.8+
pip install wandb openai>=1.0
OpenAI API key (or other AI API key)
wandb account and API key

Setup

Install wandb and set up your API keys as environment variables. This allows you to log experiments and track prompt optimization runs.

Install packages: pip install wandb openai
Set environment variables:
export WANDB_API_KEY=<your_wandb_api_key>
export OPENAI_API_KEY=<your_openai_api_key>

bash

pip install wandb openai

Step by step

This example shows how to use wandb to log prompt variations and their generated outputs from OpenAI gpt-4o. You can track prompt text, model responses, and evaluation metrics to optimize prompts.

python

import os
import wandb
from openai import OpenAI

# Initialize wandb run
wandb.init(project="prompt-optimization", entity="your-entity")

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# Define prompt variations to test
prompts = [
    "Explain the benefits of AI in healthcare.",
    "List three advantages of AI in medicine.",
    "How does AI improve patient outcomes?"
]

for i, prompt in enumerate(prompts, 1):
    # Call OpenAI chat completion
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": prompt}]
    )
    text = response.choices[0].message.content

    # Log prompt and output to wandb
    wandb.log({
        "prompt_id": i,
        "prompt_text": prompt,
        "response_text": text
    })

    print(f"Prompt {i}: {prompt}\nResponse: {text}\n")

wandb.finish()

output

Prompt 1: Explain the benefits of AI in healthcare.
Response: AI improves healthcare by enabling faster diagnosis, personalized treatment, and efficient data analysis.

Prompt 2: List three advantages of AI in medicine.
Response: 1. Enhanced diagnostic accuracy 2. Predictive analytics for patient care 3. Automation of routine tasks.

Prompt 3: How does AI improve patient outcomes?
Response: AI helps by providing tailored treatment plans, early detection of diseases, and continuous monitoring.

Common variations

You can extend this approach by:

Logging evaluation metrics such as BLEU or human ratings to wandb for quantitative prompt comparison.
Using wandb.Artifact to save prompt sets and model outputs as versioned datasets.
Running asynchronous or batch prompt tests with streaming outputs logged incrementally.
Switching models (e.g., gpt-4o-mini, claude-3-5-sonnet-20241022) by changing the model parameter.

Troubleshooting

If wandb does not log data:

Ensure your WANDB_API_KEY environment variable is set correctly.
Check your internet connection and firewall settings.
Verify wandb.init() is called before logging.
Use wandb.login() interactively if environment variable setup fails.

If OpenAI API calls fail, confirm your OPENAI_API_KEY is valid and has sufficient quota.

✅

Key Takeaways

Use wandb.init() to start tracking prompt optimization experiments.
Log prompt texts and model outputs with wandb.log() for easy comparison.
Leverage wandb.Artifact to version datasets of prompts and responses.
Track evaluation metrics alongside prompts to quantitatively optimize.
Ensure environment variables for API keys are set before running scripts.

Verified 2026-04 · gpt-4o, gpt-4o-mini, claude-3-5-sonnet-20241022

Verify ↗