How to beginner · 3 min read

How to create datasets in Langfuse

Quick answer
Use the Langfuse Python SDK to create datasets by initializing a Langfuse client and calling create_dataset with a dataset name. This organizes your AI interactions for observability and analysis.

PREREQUISITES

  • Python 3.8+
  • Langfuse API key
  • pip install langfuse

Setup

Install the langfuse Python package and set your API keys as environment variables before creating datasets.

bash
pip install langfuse

Step by step

Initialize the Langfuse client with your API keys, then create a dataset by calling create_dataset with a unique name. This dataset will group your AI interaction traces.

python
import os
from langfuse import Langfuse

# Initialize Langfuse client with your public and secret keys
langfuse = Langfuse(
    public_key=os.environ["LANGFUSE_PUBLIC_KEY"],
    secret_key=os.environ["LANGFUSE_SECRET_KEY"],
    host="https://cloud.langfuse.com"
)

# Create a new dataset
response = langfuse.create_dataset(name="my-ai-dataset")
print("Dataset created:", response)
output
Dataset created: {'id': 'dataset_abc123', 'name': 'my-ai-dataset', 'created_at': '2026-04-01T12:00:00Z'}

Common variations

You can create multiple datasets for different projects or environments by calling create_dataset with different names. Use the list_datasets method to retrieve existing datasets. The SDK supports async usage as well.

python
import asyncio

async def create_dataset_async():
    langfuse = Langfuse(
        public_key=os.environ["LANGFUSE_PUBLIC_KEY"],
        secret_key=os.environ["LANGFUSE_SECRET_KEY"],
        host="https://cloud.langfuse.com"
    )
    response = await langfuse.create_dataset(name="async-dataset")
    print("Async dataset created:", response)

asyncio.run(create_dataset_async())
output
Async dataset created: {'id': 'dataset_xyz789', 'name': 'async-dataset', 'created_at': '2026-04-01T12:05:00Z'}

Troubleshooting

  • If you get authentication errors, verify your LANGFUSE_PUBLIC_KEY and LANGFUSE_SECRET_KEY environment variables are set correctly.
  • If dataset creation fails due to name conflicts, choose a unique dataset name.
  • Check network connectivity to https://cloud.langfuse.com if requests time out.

Key Takeaways

  • Use the Langfuse Python SDK's create_dataset method to organize AI traces.
  • Set your public and secret API keys as environment variables before use.
  • You can manage multiple datasets for different projects or environments.
  • Async SDK usage is supported for integration in async applications.
  • Ensure dataset names are unique to avoid conflicts during creation.
Verified 2026-04
Verify ↗