How to create datasets in Langfuse
Quick answer
Use the
Langfuse Python SDK to create datasets by initializing a Langfuse client and calling create_dataset with a dataset name. This organizes your AI interactions for observability and analysis.PREREQUISITES
Python 3.8+Langfuse API keypip install langfuse
Setup
Install the langfuse Python package and set your API keys as environment variables before creating datasets.
pip install langfuse Step by step
Initialize the Langfuse client with your API keys, then create a dataset by calling create_dataset with a unique name. This dataset will group your AI interaction traces.
import os
from langfuse import Langfuse
# Initialize Langfuse client with your public and secret keys
langfuse = Langfuse(
public_key=os.environ["LANGFUSE_PUBLIC_KEY"],
secret_key=os.environ["LANGFUSE_SECRET_KEY"],
host="https://cloud.langfuse.com"
)
# Create a new dataset
response = langfuse.create_dataset(name="my-ai-dataset")
print("Dataset created:", response) output
Dataset created: {'id': 'dataset_abc123', 'name': 'my-ai-dataset', 'created_at': '2026-04-01T12:00:00Z'} Common variations
You can create multiple datasets for different projects or environments by calling create_dataset with different names. Use the list_datasets method to retrieve existing datasets. The SDK supports async usage as well.
import asyncio
async def create_dataset_async():
langfuse = Langfuse(
public_key=os.environ["LANGFUSE_PUBLIC_KEY"],
secret_key=os.environ["LANGFUSE_SECRET_KEY"],
host="https://cloud.langfuse.com"
)
response = await langfuse.create_dataset(name="async-dataset")
print("Async dataset created:", response)
asyncio.run(create_dataset_async()) output
Async dataset created: {'id': 'dataset_xyz789', 'name': 'async-dataset', 'created_at': '2026-04-01T12:05:00Z'} Troubleshooting
- If you get authentication errors, verify your
LANGFUSE_PUBLIC_KEYandLANGFUSE_SECRET_KEYenvironment variables are set correctly. - If dataset creation fails due to name conflicts, choose a unique dataset name.
- Check network connectivity to
https://cloud.langfuse.comif requests time out.
Key Takeaways
- Use the Langfuse Python SDK's create_dataset method to organize AI traces.
- Set your public and secret API keys as environment variables before use.
- You can manage multiple datasets for different projects or environments.
- Async SDK usage is supported for integration in async applications.
- Ensure dataset names are unique to avoid conflicts during creation.