How to beginner · 3 min read

How to use Arize Phoenix for LLM evaluation

Quick answer
Use Arize Phoenix by integrating its Python SDK to log LLM predictions, inputs, and ground truth for evaluation. This enables detailed analysis of model performance, drift detection, and error analysis through Arize's dashboard.

PREREQUISITES

  • Python 3.8+
  • Arize API key (sign up at arize.com)
  • pip install arize
  • Basic familiarity with LLM inference and evaluation

Setup

Install the arize Python package and set your Arize API key as an environment variable to authenticate your client.

bash
pip install arize

Step by step

Use the Arize Python SDK to create a client, then log your LLM's inputs, predictions, and optional ground truth labels for evaluation. This example shows how to log a batch of LLM outputs for analysis.

python
import os
from arize.pandas.logger import Client
import pandas as pd

# Initialize Arize client with API key and space key from environment
client = Client(api_key=os.environ['ARIZE_API_KEY'], space_key=os.environ['ARIZE_SPACE_KEY'])

# Prepare a sample dataframe with LLM inputs, predictions, and ground truth
data = pd.DataFrame({
    'prediction_id': ['1', '2'],
    'prediction_label': ['Hello, world!', 'Goodbye, world!'],
    'actual_label': ['Hello, world!', 'Farewell, world!'],
    'text_input': ['Say hello', 'Say goodbye']
})

# Log the batch to Arize for evaluation
client.log(
    prediction_id=data['prediction_id'],
    prediction_label=data['prediction_label'],
    actual_label=data['actual_label'],
    feature=data[['text_input']],
    model_id='llm-example-model',
    model_version='v1.0'
)

print("Logged LLM evaluation data to Arize Phoenix.")
output
Logged LLM evaluation data to Arize Phoenix.

Common variations

  • Use asynchronous logging with client.log_async() for high-throughput scenarios.
  • Log additional features like token probabilities or embeddings for deeper analysis.
  • Integrate with different LLM providers by capturing their outputs and sending to Arize.
  • Use Arize's dashboard to monitor model drift, error rates, and data quality over time.

Troubleshooting

  • If you see authentication errors, verify your ARIZE_API_KEY and ARIZE_SPACE_KEY environment variables are set correctly.
  • For missing data in the dashboard, ensure your dataframe columns match Arize's expected schema.
  • If logging fails on large batches, try splitting data into smaller chunks or use log_async().

Key Takeaways

  • Use the official arize Python SDK to log LLM inputs, predictions, and ground truth for evaluation.
  • Set ARIZE_API_KEY and ARIZE_SPACE_KEY environment variables to authenticate your client.
  • Leverage Arize Phoenix's dashboard to monitor model performance, detect drift, and analyze errors over time.
  • For high-volume logging, use asynchronous methods or batch your data to avoid timeouts.
  • Ensure your data schema aligns with Arize's requirements for accurate logging and visualization.
Verified 2026-04
Verify ↗