How to beginner · 3 min read

How to push dataset to Hugging Face hub

Quick answer
Use the datasets library's Dataset.push_to_hub() method to upload your dataset to the Hugging Face hub. Authenticate with your Hugging Face token via huggingface-cli login or environment variable, then call dataset.push_to_hub(repo_id="your-username/dataset-name") to push.

PREREQUISITES

  • Python 3.8+
  • pip install datasets huggingface_hub
  • Hugging Face account with access token

Setup

Install the required libraries and authenticate your Hugging Face account to enable pushing datasets to the hub.

bash
pip install datasets huggingface_hub
huggingface-cli login
output
Token stored successfully

Step by step

This example creates a simple dataset and pushes it to the Hugging Face hub under your username.

python
from datasets import Dataset
import os

# Create a simple dataset
data = {"text": ["Hello world", "Hugging Face"], "label": [0, 1]}
dataset = Dataset.from_dict(data)

# Push dataset to Hugging Face hub
repo_id = "your-username/my-sample-dataset"
dataset.push_to_hub(repo_id)

print(f"Dataset pushed to https://huggingface.co/datasets/{repo_id}")
output
Dataset pushed to https://huggingface.co/datasets/your-username/my-sample-dataset

Common variations

You can push datasets asynchronously or use the huggingface_hub library directly for more control. Also, you can specify private repos or add metadata.

python
from datasets import Dataset
from huggingface_hub import HfApi

# Create dataset
data = {"text": ["Async example"], "label": [1]}
dataset = Dataset.from_dict(data)

# Use HfApi to create repo and push dataset files manually
api = HfApi()
repo_id = "your-username/async-dataset"
api.create_repo(repo_id, repo_type="dataset", private=True)

# Save dataset locally and push
dataset.save_to_disk("./local_dataset")
api.upload_folder(folder_path="./local_dataset", repo_id=repo_id, repo_type="dataset")

print(f"Private dataset pushed to https://huggingface.co/datasets/{repo_id}")
output
Private dataset pushed to https://huggingface.co/datasets/your-username/async-dataset

Troubleshooting

  • If you get authentication errors, ensure you ran huggingface-cli login or set HF_TOKEN environment variable.
  • For permission denied errors, verify your token has dataset write scope.
  • If push_to_hub fails, check your internet connection and repo name format (username/repo).

Key Takeaways

  • Use the datasets library's push_to_hub method for simple dataset uploads.
  • Authenticate with huggingface-cli or environment variables before pushing.
  • You can create private dataset repos via huggingface_hub's HfApi.
  • Always verify your token permissions if you encounter access errors.
Verified 2026-04
Verify ↗