How to beginner · 3 min read

How to create datasets in Langfuse

Q: How to create datasets in Langfuse

Use the Langfuse Python SDK to create datasets by initializing a Langfuse client and calling create_dataset with a dataset name. This organizes your AI interactions for observability and analysis.

Quick answer

Use the Langfuse Python SDK to create datasets by initializing a Langfuse client and calling create_dataset with a dataset name. This organizes your AI interactions for observability and analysis.

PREREQUISITES

Python 3.8+
Langfuse API key
pip install langfuse

Setup

Install the langfuse Python package and set your API keys as environment variables before creating datasets.

bash

pip install langfuse

Step by step

Initialize the Langfuse client with your API keys, then create a dataset by calling create_dataset with a unique name. This dataset will group your AI interaction traces.

python

import os
from langfuse import Langfuse

# Initialize Langfuse client with your public and secret keys
langfuse = Langfuse(
    public_key=os.environ["LANGFUSE_PUBLIC_KEY"],
    secret_key=os.environ["LANGFUSE_SECRET_KEY"],
    host="https://cloud.langfuse.com"
)

# Create a new dataset
response = langfuse.create_dataset(name="my-ai-dataset")
print("Dataset created:", response)

output

Dataset created: {'id': 'dataset_abc123', 'name': 'my-ai-dataset', 'created_at': '2026-04-01T12:00:00Z'}

Common variations

You can create multiple datasets for different projects or environments by calling create_dataset with different names. Use the list_datasets method to retrieve existing datasets. The SDK supports async usage as well.

python

import asyncio

async def create_dataset_async():
    langfuse = Langfuse(
        public_key=os.environ["LANGFUSE_PUBLIC_KEY"],
        secret_key=os.environ["LANGFUSE_SECRET_KEY"],
        host="https://cloud.langfuse.com"
    )
    response = await langfuse.create_dataset(name="async-dataset")
    print("Async dataset created:", response)

asyncio.run(create_dataset_async())

output

Async dataset created: {'id': 'dataset_xyz789', 'name': 'async-dataset', 'created_at': '2026-04-01T12:05:00Z'}

Troubleshooting

If you get authentication errors, verify your LANGFUSE_PUBLIC_KEY and LANGFUSE_SECRET_KEY environment variables are set correctly.
If dataset creation fails due to name conflicts, choose a unique dataset name.
Check network connectivity to https://cloud.langfuse.com if requests time out.

✅

Key Takeaways

Use the Langfuse Python SDK's create_dataset method to organize AI traces.
Set your public and secret API keys as environment variables before use.
You can manage multiple datasets for different projects or environments.
Async SDK usage is supported for integration in async applications.
Ensure dataset names are unique to avoid conflicts during creation.

Verified 2026-04

Verify ↗