How to use Arize Phoenix for LLM evaluation
Quick answer
Use
Arize Phoenix by integrating its Python SDK to log LLM predictions, inputs, and ground truth for evaluation. This enables detailed analysis of model performance, drift detection, and error analysis through Arize's dashboard.PREREQUISITES
Python 3.8+Arize API key (sign up at arize.com)pip install arizeBasic familiarity with LLM inference and evaluation
Setup
Install the arize Python package and set your Arize API key as an environment variable to authenticate your client.
pip install arize Step by step
Use the Arize Python SDK to create a client, then log your LLM's inputs, predictions, and optional ground truth labels for evaluation. This example shows how to log a batch of LLM outputs for analysis.
import os
from arize.pandas.logger import Client
import pandas as pd
# Initialize Arize client with API key and space key from environment
client = Client(api_key=os.environ['ARIZE_API_KEY'], space_key=os.environ['ARIZE_SPACE_KEY'])
# Prepare a sample dataframe with LLM inputs, predictions, and ground truth
data = pd.DataFrame({
'prediction_id': ['1', '2'],
'prediction_label': ['Hello, world!', 'Goodbye, world!'],
'actual_label': ['Hello, world!', 'Farewell, world!'],
'text_input': ['Say hello', 'Say goodbye']
})
# Log the batch to Arize for evaluation
client.log(
prediction_id=data['prediction_id'],
prediction_label=data['prediction_label'],
actual_label=data['actual_label'],
feature=data[['text_input']],
model_id='llm-example-model',
model_version='v1.0'
)
print("Logged LLM evaluation data to Arize Phoenix.") output
Logged LLM evaluation data to Arize Phoenix.
Common variations
- Use asynchronous logging with
client.log_async()for high-throughput scenarios. - Log additional features like token probabilities or embeddings for deeper analysis.
- Integrate with different LLM providers by capturing their outputs and sending to Arize.
- Use Arize's dashboard to monitor model drift, error rates, and data quality over time.
Troubleshooting
- If you see authentication errors, verify your
ARIZE_API_KEYandARIZE_SPACE_KEYenvironment variables are set correctly. - For missing data in the dashboard, ensure your dataframe columns match Arize's expected schema.
- If logging fails on large batches, try splitting data into smaller chunks or use
log_async().
Key Takeaways
- Use the official
arizePython SDK to log LLM inputs, predictions, and ground truth for evaluation. - Set
ARIZE_API_KEYandARIZE_SPACE_KEYenvironment variables to authenticate your client. - Leverage Arize Phoenix's dashboard to monitor model performance, detect drift, and analyze errors over time.
- For high-volume logging, use asynchronous methods or batch your data to avoid timeouts.
- Ensure your data schema aligns with Arize's requirements for accurate logging and visualization.