How to create datasets in LangSmith
Quick answer
Use the
langsmith Python SDK to create datasets by initializing a Client and calling create_dataset with a name and optional description. This organizes your AI project data for tracking and analysis.PREREQUISITES
Python 3.8+pip install langsmithLangSmith API key set in environment variable LANGSMITH_API_KEY
Setup
Install the langsmith Python package and set your API key as an environment variable before creating datasets.
pip install langsmith Step by step
Use the Client from langsmith to create a dataset with a name and optional description. This example shows a complete runnable script.
import os
from langsmith import Client
# Ensure your API key is set in the environment
api_key = os.environ["LANGSMITH_API_KEY"]
client = Client(api_key=api_key)
# Create a dataset
response = client.create_dataset(
name="My AI Project Dataset",
description="Dataset for tracking AI model experiments"
)
print(f"Dataset created with ID: {response.id}") output
Dataset created with ID: ds_1234567890abcdef
Common variations
You can create datasets with additional metadata or tags by passing extra parameters to create_dataset. Async usage is also supported with async_client.create_dataset.
import asyncio
from langsmith import Client
async def create_dataset_async():
async_client = Client(api_key=os.environ["LANGSMITH_API_KEY"])
response = await async_client.create_dataset(
name="Async Dataset",
description="Created asynchronously"
)
print(f"Async dataset ID: {response.id}")
asyncio.run(create_dataset_async()) output
Async dataset ID: ds_abcdef1234567890
Troubleshooting
- If you get an authentication error, verify your
LANGSMITH_API_KEYenvironment variable is set correctly. - If dataset creation fails, check your network connection and API quota.
- Use
print(response)to inspect error messages returned by the API.
Key Takeaways
- Use the official
langsmithPython SDK to create and manage datasets easily. - Always set your API key in the
LANGSMITH_API_KEYenvironment variable for authentication. - Async dataset creation is supported for integration in asynchronous workflows.
- Include descriptive names and descriptions to organize datasets effectively.
- Check API responses and environment variables to troubleshoot common errors.