Cheat Sheet intermediate · 8 min read

LangSmith Cheat Sheet — Debugging & Monitoring LLM Apps

version 0.2.x

Trace, debug, and monitor LLM apps in production

LANGCHAIN_API_KEYLANGCHAIN_ENDPOINTLANGCHAIN_PROJECT

install pip install langsmith

core imports

python

from langsmith import Client
from langsmith.evaluation import evaluate
from langsmith.wrappers import wrap_openai
import os

Mental model

SDK for tracing LLM calls, capturing datasets, and evaluating production chains.

Like New Relic for backend services, but for LLM chains. It records every request, shows you where latency happens, catches failures, and lets you benchmark improvements.

Core Patterns

01 Enable Tracing with Environment Variables

Automatic tracing of all LangChain calls

python

import os
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

# Set before imports for auto-instrumentation
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_API_KEY"] = os.environ["LANGCHAIN_API_KEY"]
os.environ["LANGCHAIN_PROJECT"] = "my-project"

# All subsequent calls auto-traced
llm = ChatOpenAI(model="gpt-4o")
prompt = ChatPromptTemplate.from_template("Explain {topic} briefly")
chain = prompt | llm | StrOutputParser()

result = chain.invoke({"topic": "quantum computing"})
print(result)

output Trace automatically captured in LangSmith dashboard

Tracing must be enabled BEFORE creating LangChain objects. Setting env vars after imports won't instrument existing chains.

02 Manual Tracing with Client

Custom tracing of non-LangChain code

python

from langsmith import Client
from datetime import datetime
import json
import os

client = Client(
    api_key=os.environ["LANGCHAIN_API_KEY"],
    url=os.environ.get("LANGCHAIN_ENDPOINT", "https://api.smith.langchain.com")
)

# Manual trace for custom function
with client.trace_as_chain_run(
    name="my-custom-step",
    inputs={"query": "What is RAG?"},
    run_type="llm"
) as run:
    # Your custom logic here
    result = "RAG retrieves external documents..."
    run.end(outputs={"response": result})

output Trace visible in LangSmith with custom inputs/outputs

Forget to call run.end() and the trace won't finalize. Use context manager (with statement) to auto-close.

03 Trace Native OpenAI Calls

Instrument OpenAI SDK calls without LangChain

python

from langsmith.wrappers import wrap_openai
from openai import OpenAI
import os

# Wrap the OpenAI client
client = wrap_openai(OpenAI(api_key=os.environ["OPENAI_API_KEY"]))

# All calls auto-traced to LangSmith
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "What is LangSmith?"}]
)

print(response.choices[0].message.content)

output OpenAI call traced in LangSmith dashboard

wrap_openai returns a wrapped client; don't create OpenAI() twice. Store the wrapped client in a variable.

04 Evaluate Chains on Datasets

Benchmark chain performance against labeled examples

python

from langsmith import Client
from langsmith.evaluation import evaluate
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
import os

client = Client(api_key=os.environ["LANGCHAIN_API_KEY"])

# Reference existing dataset or create inline
dataset_name = "sentiment-test-v1"

def evaluator(run, example):
    """Simple evaluator: check if output contains sentiment word"""
    prediction = run.outputs.get("output", "")
    expected = example.outputs.get("sentiment", "")
    return {"score": 1.0 if expected.lower() in prediction.lower() else 0.0}

chain = ChatPromptTemplate.from_template(
    "Classify sentiment: {text}"
) | ChatOpenAI(model="gpt-4o") | StrOutputParser()

results = evaluate(
    chain.invoke,
    data=dataset_name,
    evaluators=[evaluator],
    metadata={"version": "1.0"},
)

print(f"Accuracy: {results['results'][0]['metrics'].get('score', 0):.2%}")

output Evaluation metrics in LangSmith dashboard

Evaluators must return a dict with score or boolean. If dataset doesn't exist, evaluate() will fail: create it via UI or client.create_dataset().

Client API Reference

Method / Property	Description	Returns
`Client(api_key, url)`	Initialize LangSmith client for manual tracing and dataset operations	Client instance
`client.trace_as_chain_run(name, inputs, run_type)`	Context manager for manual tracing. run_type: 'llm', 'chain', 'tool', 'retriever'	Run context object with .end() method
`client.create_dataset(dataset_name, description)`	Create a new dataset for evaluation or examples	Dataset object
`client.read_dataset(dataset_name)`	Load an existing dataset by name	Dataset object with examples
`client.delete_dataset(dataset_name)`	Delete dataset and all associated examples	None
`evaluate(target, data, evaluators, metadata)`	Run evaluation on a chain/function against dataset examples	EvaluationResult with metrics

LangSmith Configuration

Environment Variables

Variable	Required	Default	Purpose
`LANGCHAIN_TRACING_V2`	Yes (for auto-trace)	false	Enable automatic tracing of all LangChain calls
`LANGCHAIN_API_KEY`	Yes	:	API key from https://smith.langchain.com
`LANGCHAIN_ENDPOINT`	No	https://api.smith.langchain.com	Custom LangSmith deployment URL
`LANGCHAIN_PROJECT`	No	default	Project name in LangSmith (for organization)

Common Errors & Fixes

01 AuthenticationError: Could not authenticate with LangSmith

Cause: LANGCHAIN_API_KEY is missing, empty, or invalid

Fix:

python

# 1. Get key from https://smith.langchain.com (top-right menu)
# 2. Set in environment
import os
os.environ["LANGCHAIN_API_KEY"] = "ls_..."

# 3. Verify in Python
from langsmith import Client
try:
    client = Client(api_key=os.environ["LANGCHAIN_API_KEY"])
    print("✓ Authenticated")
except Exception as e:
    print(f"✗ Failed: {e}")

02 Traces not appearing in dashboard

Cause: LANGCHAIN_TRACING_V2 set after LangChain imports, or project name mismatch

Fix:

python

# CRITICAL: Set env vars BEFORE importing LangChain
import os
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_API_KEY"] = os.environ["LANGCHAIN_API_KEY"]
os.environ["LANGCHAIN_PROJECT"] = "my-project"

# NOW import LangChain (not before)
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate

# This will be traced
chain = ChatPromptTemplate.from_template("Hello {name}") | ChatOpenAI()
result = chain.invoke({"name": "Alice"})

03 RuntimeError: No runs found for evaluation

Cause: Dataset doesn't exist or chain didn't produce runs during evaluation

Fix:

python

from langsmith import Client

client = Client(api_key=os.environ["LANGCHAIN_API_KEY"])

# Check if dataset exists
datasets = client.list_datasets()
dataset_names = [d.name for d in datasets]

if "my-dataset" not in dataset_names:
    # Create dataset
    dataset = client.create_dataset(
        dataset_name="my-dataset",
        description="Test dataset"
    )
    # Add examples
    dataset.add_example(
        inputs={"text": "Hello world"},
        outputs={"result": "greeting"}
    )

print(f"Available datasets: {dataset_names}")

04 InvalidOperation: Cannot manually set run ID on active run

Cause: Trying to modify run properties after context manager started

Fix:

python

from langsmith import Client
import os

client = Client(api_key=os.environ["LANGCHAIN_API_KEY"])

# Correct: Set metadata before ending
with client.trace_as_chain_run(
    name="my-run",
    inputs={"query": "test"},
    tags=["production"],
    metadata={"user_id": "123"}
) as run:
    result = "output"
    run.end(outputs={"result": result})
    # Don't modify run after .end()

# If you need to update, create new run with updated values

Production Gotchas

⚠ Tracing adds latency in production

LangSmith sends traces over HTTP. In high-throughput apps, this can add 50-200ms per call. Use batch flushing and set LANGCHAIN_ENDPOINT to a local proxy if running at scale. Consider sampling: not every call needs to be traced.

⚠ Sensitive data in traces

By default, all inputs/outputs (including API keys, PII, secrets) are logged to LangSmith. Use run.end(outputs={...}) with sanitized outputs, or disable tracing for sensitive endpoints with LANGCHAIN_TRACING_V2=false conditionally.

⚠ Project names are case-sensitive

LANGCHAIN_PROJECT="MyProject" and LANGCHAIN_PROJECT="myproject" are different projects in the dashboard. Set it consistently across your codebase to avoid scattered traces.

⚠ Dataset examples don't auto-sync

If you modify a dataset in the UI (relabel examples, add new rows), your Python script won't see changes until you reload with client.read_dataset(). No auto-reload on disk/API changes.

⚠ Evaluators run sequentially, not in parallel

If you have 10 evaluators and 100 dataset examples, all 1000 evaluations run serially. For large evaluations, build custom parallel logic outside of evaluate().

LangSmith vs Alternatives

Feature	LangSmith	Langfuse	Arize
LangChain native	✓ Built-in	○ Third-party	○ Third-party
Tracing & debugging	✓ Excellent	✓ Excellent	✓ Good
Dataset management	✓ Yes	○ Limited	○ No
Evaluation framework	✓ Built-in	○ Basic	○ No
Self-hosted option	○ Enterprise only	✓ Open-source	✓ Available
Free tier	✓ Generous	✓ Generous	○ Limited

Verified 2026-04 · v0.2.x · gpt-4o, gpt-4o-mini

Verify ↗

Community Notes

No notes yetBe the first to share a version-specific fix or tip.