How to use LangSmith evaluation
Quick answer
Use
LangSmith evaluation by installing the langsmith Python SDK, configuring your API key, and creating evaluation runs with model outputs and references. The SDK provides methods to log predictions, references, and metadata, enabling detailed analysis of AI model performance.PREREQUISITES
Python 3.8+LangSmith account and API keypip install langsmith
Setup
Install the langsmith Python SDK and set your API key as an environment variable to authenticate requests.
pip install langsmith Step by step
This example shows how to create an evaluation run, add model predictions and references, and submit the evaluation for analysis.
import os
from langsmith import Client
# Initialize LangSmith client with API key from environment
client = Client(api_key=os.environ["LANGSMITH_API_KEY"])
# Create a new evaluation run
run = client.create_evaluation_run(name="My Model Evaluation")
# Add examples with predictions and references
examples = [
{"input": "What is the capital of France?", "prediction": "Paris", "reference": "Paris"},
{"input": "Who wrote Hamlet?", "prediction": "Shakespeare", "reference": "William Shakespeare"},
{"input": "What is 2+2?", "prediction": "4", "reference": "4"}
]
for ex in examples:
run.add_example(
input=ex["input"],
prediction=ex["prediction"],
reference=ex["reference"]
)
# Submit the evaluation run
run.submit()
print(f"Evaluation run '{run.name}' submitted with {len(examples)} examples.") output
Evaluation run 'My Model Evaluation' submitted with 3 examples.
Common variations
You can customize evaluations by adding metadata, using different evaluation metrics, or running asynchronous submissions. LangSmith supports multiple models and batch evaluations.
import asyncio
async def async_evaluation():
client = Client(api_key=os.environ["LANGSMITH_API_KEY"])
run = await client.create_evaluation_run_async(name="Async Eval")
await run.add_example_async(
input="What is AI?",
prediction="Artificial Intelligence",
reference="Artificial Intelligence"
)
await run.submit_async()
print(f"Async evaluation run '{run.name}' submitted.")
asyncio.run(async_evaluation()) output
Async evaluation run 'Async Eval' submitted.
Troubleshooting
- If you see authentication errors, verify your
LANGSMITH_API_KEYenvironment variable is set correctly. - For network issues, check your internet connection and firewall settings.
- If evaluation examples fail to submit, ensure inputs, predictions, and references are non-empty strings.
Key Takeaways
- Install and authenticate with the LangSmith Python SDK using your API key.
- Create evaluation runs by adding inputs, model predictions, and reference outputs.
- Submit evaluations to analyze model performance and track results.
- Use async methods for scalable or batch evaluation workflows.
- Check environment variables and input data format to avoid common errors.