How to beginner · 3 min read

How to use LangSmith evaluation

Q: How to use LangSmith evaluation

Use LangSmith evaluation by installing the langsmith Python SDK, configuring your API key, and creating evaluation runs with model outputs and references. The SDK provides methods to log predictions, references, and metadata, enabling detailed analysis of AI model performance.

Quick answer

Use LangSmith evaluation by installing the langsmith Python SDK, configuring your API key, and creating evaluation runs with model outputs and references. The SDK provides methods to log predictions, references, and metadata, enabling detailed analysis of AI model performance.

PREREQUISITES

Python 3.8+
LangSmith account and API key
pip install langsmith

Setup

Install the langsmith Python SDK and set your API key as an environment variable to authenticate requests.

bash

pip install langsmith

Step by step

This example shows how to create an evaluation run, add model predictions and references, and submit the evaluation for analysis.

python

import os
from langsmith import Client

# Initialize LangSmith client with API key from environment
client = Client(api_key=os.environ["LANGSMITH_API_KEY"])

# Create a new evaluation run
run = client.create_evaluation_run(name="My Model Evaluation")

# Add examples with predictions and references
examples = [
    {"input": "What is the capital of France?", "prediction": "Paris", "reference": "Paris"},
    {"input": "Who wrote Hamlet?", "prediction": "Shakespeare", "reference": "William Shakespeare"},
    {"input": "What is 2+2?", "prediction": "4", "reference": "4"}
]

for ex in examples:
    run.add_example(
        input=ex["input"],
        prediction=ex["prediction"],
        reference=ex["reference"]
    )

# Submit the evaluation run
run.submit()

print(f"Evaluation run '{run.name}' submitted with {len(examples)} examples.")

output

Evaluation run 'My Model Evaluation' submitted with 3 examples.

Common variations

You can customize evaluations by adding metadata, using different evaluation metrics, or running asynchronous submissions. LangSmith supports multiple models and batch evaluations.

python

import asyncio

async def async_evaluation():
    client = Client(api_key=os.environ["LANGSMITH_API_KEY"])
    run = await client.create_evaluation_run_async(name="Async Eval")
    await run.add_example_async(
        input="What is AI?",
        prediction="Artificial Intelligence",
        reference="Artificial Intelligence"
    )
    await run.submit_async()
    print(f"Async evaluation run '{run.name}' submitted.")

asyncio.run(async_evaluation())

output

Async evaluation run 'Async Eval' submitted.

Troubleshooting

If you see authentication errors, verify your LANGSMITH_API_KEY environment variable is set correctly.
For network issues, check your internet connection and firewall settings.
If evaluation examples fail to submit, ensure inputs, predictions, and references are non-empty strings.

✅

Key Takeaways

Install and authenticate with the LangSmith Python SDK using your API key.
Create evaluation runs by adding inputs, model predictions, and reference outputs.
Submit evaluations to analyze model performance and track results.
Use async methods for scalable or batch evaluation workflows.
Check environment variables and input data format to avoid common errors.

Verified 2026-04 · gpt-4o, claude-3-5-sonnet-20241022

Verify ↗