LangSmith Cheat Sheet — Debugging & Monitoring LLM Apps
from langsmith import Client
from langsmith.evaluation import evaluate
from langsmith.wrappers import wrap_openai
import os SDK for tracing LLM calls, capturing datasets, and evaluating production chains.
Like New Relic for backend services, but for LLM chains. It records every request, shows you where latency happens, catches failures, and lets you benchmark improvements.
Core Patterns
import os
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
# Set before imports for auto-instrumentation
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_API_KEY"] = os.environ["LANGCHAIN_API_KEY"]
os.environ["LANGCHAIN_PROJECT"] = "my-project"
# All subsequent calls auto-traced
llm = ChatOpenAI(model="gpt-4o")
prompt = ChatPromptTemplate.from_template("Explain {topic} briefly")
chain = prompt | llm | StrOutputParser()
result = chain.invoke({"topic": "quantum computing"})
print(result) Trace automatically captured in LangSmith dashboard from langsmith import Client
from datetime import datetime
import json
import os
client = Client(
api_key=os.environ["LANGCHAIN_API_KEY"],
url=os.environ.get("LANGCHAIN_ENDPOINT", "https://api.smith.langchain.com")
)
# Manual trace for custom function
with client.trace_as_chain_run(
name="my-custom-step",
inputs={"query": "What is RAG?"},
run_type="llm"
) as run:
# Your custom logic here
result = "RAG retrieves external documents..."
run.end(outputs={"response": result}) Trace visible in LangSmith with custom inputs/outputs from langsmith.wrappers import wrap_openai
from openai import OpenAI
import os
# Wrap the OpenAI client
client = wrap_openai(OpenAI(api_key=os.environ["OPENAI_API_KEY"]))
# All calls auto-traced to LangSmith
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "What is LangSmith?"}]
)
print(response.choices[0].message.content) OpenAI call traced in LangSmith dashboard from langsmith import Client
from langsmith.evaluation import evaluate
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
import os
client = Client(api_key=os.environ["LANGCHAIN_API_KEY"])
# Reference existing dataset or create inline
dataset_name = "sentiment-test-v1"
def evaluator(run, example):
"""Simple evaluator: check if output contains sentiment word"""
prediction = run.outputs.get("output", "")
expected = example.outputs.get("sentiment", "")
return {"score": 1.0 if expected.lower() in prediction.lower() else 0.0}
chain = ChatPromptTemplate.from_template(
"Classify sentiment: {text}"
) | ChatOpenAI(model="gpt-4o") | StrOutputParser()
results = evaluate(
chain.invoke,
data=dataset_name,
evaluators=[evaluator],
metadata={"version": "1.0"},
)
print(f"Accuracy: {results['results'][0]['metrics'].get('score', 0):.2%}") Evaluation metrics in LangSmith dashboard Client API Reference
| Method / Property | Description | Returns |
|---|---|---|
Client(api_key, url) | Initialize LangSmith client for manual tracing and dataset operations | Client instance |
client.trace_as_chain_run(name, inputs, run_type) | Context manager for manual tracing. run_type: 'llm', 'chain', 'tool', 'retriever' | Run context object with .end() method |
client.create_dataset(dataset_name, description) | Create a new dataset for evaluation or examples | Dataset object |
client.read_dataset(dataset_name) | Load an existing dataset by name | Dataset object with examples |
client.delete_dataset(dataset_name) | Delete dataset and all associated examples | None |
evaluate(target, data, evaluators, metadata) | Run evaluation on a chain/function against dataset examples | EvaluationResult with metrics |
LangSmith Configuration
Environment Variables
| Variable | Required | Default | Purpose |
|---|---|---|---|
LANGCHAIN_TRACING_V2 | Yes (for auto-trace) | false | Enable automatic tracing of all LangChain calls |
LANGCHAIN_API_KEY | Yes | : | API key from https://smith.langchain.com |
LANGCHAIN_ENDPOINT | No | https://api.smith.langchain.com | Custom LangSmith deployment URL |
LANGCHAIN_PROJECT | No | default | Project name in LangSmith (for organization) |
Common Errors & Fixes
AuthenticationError: Could not authenticate with LangSmith Cause: LANGCHAIN_API_KEY is missing, empty, or invalid
# 1. Get key from https://smith.langchain.com (top-right menu)
# 2. Set in environment
import os
os.environ["LANGCHAIN_API_KEY"] = "ls_..."
# 3. Verify in Python
from langsmith import Client
try:
client = Client(api_key=os.environ["LANGCHAIN_API_KEY"])
print("✓ Authenticated")
except Exception as e:
print(f"✗ Failed: {e}") Traces not appearing in dashboard Cause: LANGCHAIN_TRACING_V2 set after LangChain imports, or project name mismatch
# CRITICAL: Set env vars BEFORE importing LangChain
import os
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_API_KEY"] = os.environ["LANGCHAIN_API_KEY"]
os.environ["LANGCHAIN_PROJECT"] = "my-project"
# NOW import LangChain (not before)
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
# This will be traced
chain = ChatPromptTemplate.from_template("Hello {name}") | ChatOpenAI()
result = chain.invoke({"name": "Alice"}) RuntimeError: No runs found for evaluation Cause: Dataset doesn't exist or chain didn't produce runs during evaluation
from langsmith import Client
client = Client(api_key=os.environ["LANGCHAIN_API_KEY"])
# Check if dataset exists
datasets = client.list_datasets()
dataset_names = [d.name for d in datasets]
if "my-dataset" not in dataset_names:
# Create dataset
dataset = client.create_dataset(
dataset_name="my-dataset",
description="Test dataset"
)
# Add examples
dataset.add_example(
inputs={"text": "Hello world"},
outputs={"result": "greeting"}
)
print(f"Available datasets: {dataset_names}") InvalidOperation: Cannot manually set run ID on active run Cause: Trying to modify run properties after context manager started
from langsmith import Client
import os
client = Client(api_key=os.environ["LANGCHAIN_API_KEY"])
# Correct: Set metadata before ending
with client.trace_as_chain_run(
name="my-run",
inputs={"query": "test"},
tags=["production"],
metadata={"user_id": "123"}
) as run:
result = "output"
run.end(outputs={"result": result})
# Don't modify run after .end()
# If you need to update, create new run with updated values Production Gotchas
LangSmith sends traces over HTTP. In high-throughput apps, this can add 50-200ms per call. Use batch flushing and set LANGCHAIN_ENDPOINT to a local proxy if running at scale. Consider sampling: not every call needs to be traced.
By default, all inputs/outputs (including API keys, PII, secrets) are logged to LangSmith. Use run.end(outputs={...}) with sanitized outputs, or disable tracing for sensitive endpoints with LANGCHAIN_TRACING_V2=false conditionally.
LANGCHAIN_PROJECT="MyProject" and LANGCHAIN_PROJECT="myproject" are different projects in the dashboard. Set it consistently across your codebase to avoid scattered traces.
If you modify a dataset in the UI (relabel examples, add new rows), your Python script won't see changes until you reload with client.read_dataset(). No auto-reload on disk/API changes.
If you have 10 evaluators and 100 dataset examples, all 1000 evaluations run serially. For large evaluations, build custom parallel logic outside of evaluate().
LangSmith vs Alternatives
| Feature | LangSmith | Langfuse | Arize |
|---|---|---|---|
| LangChain native | ✓ Built-in | ○ Third-party | ○ Third-party |
| Tracing & debugging | ✓ Excellent | ✓ Excellent | ✓ Good |
| Dataset management | ✓ Yes | ○ Limited | ○ No |
| Evaluation framework | ✓ Built-in | ○ Basic | ○ No |
| Self-hosted option | ○ Enterprise only | ✓ Open-source | ✓ Available |
| Free tier | ✓ Generous | ✓ Generous | ○ Limited |