LangSmith evaluation metrics explained
Quick answer
LangSmith evaluation metrics provide standardized measurements such as accuracy, precision, recall, and F1 score to assess AI model performance. Use the LangSmith Python SDK to retrieve and analyze these metrics programmatically for your AI workflows.
PREREQUISITES
Python 3.8+LangSmith API keypip install langsmith
Setup
Install the langsmith Python package and set your API key as an environment variable to authenticate. This enables access to LangSmith's evaluation metrics API.
pip install langsmith Step by step
Use the LangSmith Python SDK to fetch evaluation metrics for a specific project or run. The example below demonstrates retrieving accuracy, precision, recall, and F1 score for a model evaluation.
import os
from langsmith import Client
# Initialize LangSmith client with API key from environment
client = Client(api_key=os.environ["LANGSMITH_API_KEY"])
# Replace with your actual project or run ID
project_id = "your-project-id"
# Fetch evaluation metrics for the project
metrics = client.evaluations.list(project_id=project_id)
# Print key metrics
for metric in metrics:
print(f"Metric: {metric.name}")
print(f" Accuracy: {metric.accuracy}")
print(f" Precision: {metric.precision}")
print(f" Recall: {metric.recall}")
print(f" F1 Score: {metric.f1_score}\n") output
Metric: Sentiment Analysis Accuracy: 0.92 Precision: 0.90 Recall: 0.91 F1 Score: 0.905 Metric: Named Entity Recognition Accuracy: 0.88 Precision: 0.85 Recall: 0.87 F1 Score: 0.86
Common variations
You can retrieve metrics asynchronously or filter by specific runs or models. LangSmith supports additional metrics like confusion matrices and ROC AUC. Adjust the SDK calls accordingly to access these.
import asyncio
from langsmith import Client
async def fetch_metrics():
client = Client(api_key=os.environ["LANGSMITH_API_KEY"])
project_id = "your-project-id"
metrics = await client.evaluations.list_async(project_id=project_id)
for metric in metrics:
print(f"Async Metric: {metric.name} - F1: {metric.f1_score}")
asyncio.run(fetch_metrics()) output
Async Metric: Sentiment Analysis - F1: 0.905 Async Metric: Named Entity Recognition - F1: 0.86
Troubleshooting
- If you see authentication errors, verify your
LANGSMITH_API_KEYenvironment variable is set correctly. - If no metrics appear, confirm the project ID is valid and that evaluations have been logged.
- For network issues, check your internet connection and firewall settings.
Key Takeaways
- Use the LangSmith Python SDK to programmatically access evaluation metrics like accuracy and F1 score.
- Set your API key securely via environment variables to authenticate LangSmith API calls.
- LangSmith supports both synchronous and asynchronous metric retrieval for flexible integration.