How to beginner · 3 min read

LangSmith evaluation metrics explained

Quick answer
LangSmith evaluation metrics provide standardized measurements such as accuracy, precision, recall, and F1 score to assess AI model performance. Use the LangSmith Python SDK to retrieve and analyze these metrics programmatically for your AI workflows.

PREREQUISITES

  • Python 3.8+
  • LangSmith API key
  • pip install langsmith

Setup

Install the langsmith Python package and set your API key as an environment variable to authenticate. This enables access to LangSmith's evaluation metrics API.

bash
pip install langsmith

Step by step

Use the LangSmith Python SDK to fetch evaluation metrics for a specific project or run. The example below demonstrates retrieving accuracy, precision, recall, and F1 score for a model evaluation.

python
import os
from langsmith import Client

# Initialize LangSmith client with API key from environment
client = Client(api_key=os.environ["LANGSMITH_API_KEY"])

# Replace with your actual project or run ID
project_id = "your-project-id"

# Fetch evaluation metrics for the project
metrics = client.evaluations.list(project_id=project_id)

# Print key metrics
for metric in metrics:
    print(f"Metric: {metric.name}")
    print(f"  Accuracy: {metric.accuracy}")
    print(f"  Precision: {metric.precision}")
    print(f"  Recall: {metric.recall}")
    print(f"  F1 Score: {metric.f1_score}\n")
output
Metric: Sentiment Analysis
  Accuracy: 0.92
  Precision: 0.90
  Recall: 0.91
  F1 Score: 0.905

Metric: Named Entity Recognition
  Accuracy: 0.88
  Precision: 0.85
  Recall: 0.87
  F1 Score: 0.86

Common variations

You can retrieve metrics asynchronously or filter by specific runs or models. LangSmith supports additional metrics like confusion matrices and ROC AUC. Adjust the SDK calls accordingly to access these.

python
import asyncio
from langsmith import Client

async def fetch_metrics():
    client = Client(api_key=os.environ["LANGSMITH_API_KEY"])
    project_id = "your-project-id"
    metrics = await client.evaluations.list_async(project_id=project_id)
    for metric in metrics:
        print(f"Async Metric: {metric.name} - F1: {metric.f1_score}")

asyncio.run(fetch_metrics())
output
Async Metric: Sentiment Analysis - F1: 0.905
Async Metric: Named Entity Recognition - F1: 0.86

Troubleshooting

  • If you see authentication errors, verify your LANGSMITH_API_KEY environment variable is set correctly.
  • If no metrics appear, confirm the project ID is valid and that evaluations have been logged.
  • For network issues, check your internet connection and firewall settings.

Key Takeaways

  • Use the LangSmith Python SDK to programmatically access evaluation metrics like accuracy and F1 score.
  • Set your API key securely via environment variables to authenticate LangSmith API calls.
  • LangSmith supports both synchronous and asynchronous metric retrieval for flexible integration.
Verified 2026-04
Verify ↗