How to use wandb for prompt optimization
Quick answer
Use wandb to track and log your prompt optimization experiments by initializing a wandb run, logging prompts and model outputs as metrics or artifacts, and comparing results across runs. This enables systematic prompt tuning and visualization of performance improvements.
PREREQUISITES
Python 3.8+pip install wandb openai>=1.0OpenAI API key (or other AI API key)wandb account and API key
Setup
Install wandb and set up your API keys as environment variables. This allows you to log experiments and track prompt optimization runs.
- Install packages:
pip install wandb openai - Set environment variables:
export WANDB_API_KEY=<your_wandb_api_key>export OPENAI_API_KEY=<your_openai_api_key>
pip install wandb openai Step by step
This example shows how to use wandb to log prompt variations and their generated outputs from OpenAI gpt-4o. You can track prompt text, model responses, and evaluation metrics to optimize prompts.
import os
import wandb
from openai import OpenAI
# Initialize wandb run
wandb.init(project="prompt-optimization", entity="your-entity")
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
# Define prompt variations to test
prompts = [
"Explain the benefits of AI in healthcare.",
"List three advantages of AI in medicine.",
"How does AI improve patient outcomes?"
]
for i, prompt in enumerate(prompts, 1):
# Call OpenAI chat completion
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": prompt}]
)
text = response.choices[0].message.content
# Log prompt and output to wandb
wandb.log({
"prompt_id": i,
"prompt_text": prompt,
"response_text": text
})
print(f"Prompt {i}: {prompt}\nResponse: {text}\n")
wandb.finish() output
Prompt 1: Explain the benefits of AI in healthcare. Response: AI improves healthcare by enabling faster diagnosis, personalized treatment, and efficient data analysis. Prompt 2: List three advantages of AI in medicine. Response: 1. Enhanced diagnostic accuracy 2. Predictive analytics for patient care 3. Automation of routine tasks. Prompt 3: How does AI improve patient outcomes? Response: AI helps by providing tailored treatment plans, early detection of diseases, and continuous monitoring.
Common variations
You can extend this approach by:
- Logging evaluation metrics such as BLEU or human ratings to wandb for quantitative prompt comparison.
- Using wandb.Artifact to save prompt sets and model outputs as versioned datasets.
- Running asynchronous or batch prompt tests with streaming outputs logged incrementally.
- Switching models (e.g., gpt-4o-mini, claude-3-5-sonnet-20241022) by changing the model parameter.
Troubleshooting
If wandb does not log data:
- Ensure your WANDB_API_KEY environment variable is set correctly.
- Check your internet connection and firewall settings.
- Verify wandb.init() is called before logging.
- Use wandb.login() interactively if environment variable setup fails.
If OpenAI API calls fail, confirm your OPENAI_API_KEY is valid and has sufficient quota.
Key Takeaways
- Use wandb.init() to start tracking prompt optimization experiments.
- Log prompt texts and model outputs with wandb.log() for easy comparison.
- Leverage wandb.Artifact to version datasets of prompts and responses.
- Track evaluation metrics alongside prompts to quantitatively optimize.
- Ensure environment variables for API keys are set before running scripts.