Sharing Datasets on Hugging Face Hub

Hugging Face Hub is the GitHub of AI data and models. Uploading a dataset there makes it instantly accessible to millions of practitioners.

40 min · Reviewed 2026

The Default Home of AI Data

Hugging Face Hub hosts over 200,000 datasets and over 1 million models as of 2024. Uploading your dataset there gives it citation, versioning, a built-in viewer, and instant programmatic access from any project using the datasets library. It is free for public datasets.

Step 1: install and authenticate

pip install huggingface_hub datasets

# Log in (grab a token from https://huggingface.co/settings/tokens)
huggingface-cli loginOne-time setup

Step 2: prepare your data

import pandas as pd
from datasets import Dataset

df = pd.read_csv('labeled_complaints.csv')

# Convert to a Hugging Face Dataset
ds = Dataset.from_pandas(df)
print(ds)

# Create a train/validation/test split
ds = ds.train_test_split(test_size=0.2, seed=42)
print(ds)Convert pandas to a Hugging Face Dataset

Step 3: write a README / data card

---
language:
  - en
license: cc-by-4.0
task_categories:
  - text-classification
task_ids:
  - sentiment-classification
size_categories:
  - n<1K
pretty_name: Tweet Complaints vs Praise
---

# Tweet Complaints vs. Praise

## Description
500 English tweets labeled as complaint, praise, or neither,
collected from public data in 2026.

## Sources
Sampled from cardiffnlp/tweet_eval; relabeled by two annotators.

## Labels
- 0 = complaint
- 1 = praise
- 2 = neither

## Agreement
Cohen's kappa between annotators: 0.78 (substantial)

## Limitations
- English only
- Skewed toward consumer tech topics
- Labels reflect US cultural context; may not transfer

## License
CC-BY-4.0. Please cite Tendril Content Team, 2026.A Hugging Face dataset card

Step 4: push it

from datasets import DatasetDict

# Push to your Hugging Face account
ds.push_to_hub('your-username/tweet-complaints-praise')

# Or save locally first, then upload via git
# ds.save_to_disk('./tweet-complaints-praise')One-line publish

Step 5: verify and share

Visit https://huggingface.co/datasets/your-username/tweet-complaints-praise
Confirm the viewer loads and splits look right
Ensure README renders; fix any YAML errors
Add tags so others can find it
Share the link on relevant communities

Good practices for Hub releases

Use Parquet format (faster than CSV for the viewer)
Keep individual files under 5 GB
Include train/validation/test splits
Version your dataset (v1.0, v2.0) rather than overwriting
Respond to issues and discussions in the community tab
If you discover a problem later, release a corrected version with a changelog

The big idea: publishing a dataset on Hugging Face is the 21st-century equivalent of publishing a paper. It is permanent, searchable, usable, and attributable. If you build a dataset, ship it. The community learns when you share.

End-of-lesson check

15 questions · take it digitally for instant feedback at tendril.neural-forge.io/learn/quiz/end-data-sharing-huggingface

What is the core idea behind "Sharing Datasets on Hugging Face Hub"?
1. Hugging Face Hub is the GitHub of AI data and models. Uploading a dataset there makes it instantly accessible to millions of practitioners.
2. A/B test results where a minority group reverses the majority trend
3. kernel
4. Voice memos
Which term best describes a foundational idea in "Sharing Datasets on Hugging Face Hub"?
1. dataset card
2. Hugging Face
3. push_to_hub
4. Parquet
A learner studying Sharing Datasets on Hugging Face Hub would need to understand which concept?
1. Hugging Face
2. push_to_hub
3. dataset card
4. Parquet
Which of these is directly relevant to Sharing Datasets on Hugging Face Hub?
1. Hugging Face
2. dataset card
3. Parquet
4. push_to_hub
Which of the following is a key point about Sharing Datasets on Hugging Face Hub?
1. Visit https://huggingface.co/datasets/your-username/tweet-complaints-praise
2. Confirm the viewer loads and splits look right
3. Ensure README renders; fix any YAML errors
4. Add tags so others can find it
Which of these does NOT belong in a discussion of Sharing Datasets on Hugging Face Hub?
1. Ensure README renders; fix any YAML errors
2. Confirm the viewer loads and splits look right
3. A/B test results where a minority group reverses the majority trend
4. Visit https://huggingface.co/datasets/your-username/tweet-complaints-praise
Which statement is accurate regarding Sharing Datasets on Hugging Face Hub?
1. Keep individual files under 5 GB
2. Include train/validation/test splits
3. Use Parquet format (faster than CSV for the viewer)
4. Version your dataset (v1.0, v2.0) rather than overwriting
Which of these does NOT belong in a discussion of Sharing Datasets on Hugging Face Hub?
1. Use Parquet format (faster than CSV for the viewer)
2. Include train/validation/test splits
3. A/B test results where a minority group reverses the majority trend
4. Keep individual files under 5 GB
Which statement accurately describes an aspect of Sharing Datasets on Hugging Face Hub?
1. Hugging Face Hub hosts over 200,000 datasets and over 1 million models as of 2024.
2. A/B test results where a minority group reverses the majority trend
3. kernel
4. Voice memos
What does working with Sharing Datasets on Hugging Face Hub typically involve?
1. A/B test results where a minority group reverses the majority trend
2. The big idea: publishing a dataset on Hugging Face is the 21st-century equivalent of publishing a paper.
3. kernel
4. Voice memos
Which best describes the scope of "Sharing Datasets on Hugging Face Hub"?
1. It is unrelated to foundations workflows
2. It applies only to the opposite beginner tier
3. It focuses on Hugging Face Hub is the GitHub of AI data and models. Uploading a dataset there makes it instantly a
4. It was deprecated in 2024 and no longer relevant
Which section heading best belongs in a lesson about Sharing Datasets on Hugging Face Hub?
1. A/B test results where a minority group reverses the majority trend
2. kernel
3. Voice memos
4. Step 1: install and authenticate
Which section heading best belongs in a lesson about Sharing Datasets on Hugging Face Hub?
1. Step 2: prepare your data
2. A/B test results where a minority group reverses the majority trend
3. kernel
4. Voice memos
Which section heading best belongs in a lesson about Sharing Datasets on Hugging Face Hub?
1. A/B test results where a minority group reverses the majority trend
2. Step 3: write a README / data card
3. kernel
4. Voice memos
Which section heading best belongs in a lesson about Sharing Datasets on Hugging Face Hub?
1. A/B test results where a minority group reverses the majority trend
2. kernel
3. Step 4: push it
4. Voice memos

← Back to interactive lesson

Tendril · Creators · AI Foundations

Sharing Datasets on Hugging Face Hub

Hugging Face Hub is the GitHub of AI data and models. Uploading a dataset there makes it instantly accessible to millions of practitioners.

40 min · Reviewed 2026

The Default Home of AI Data

Step 1: install and authenticate

pip install huggingface_hub datasets

# Log in (grab a token from https://huggingface.co/settings/tokens)
huggingface-cli loginOne-time setup

Step 2: prepare your data

import pandas as pd
from datasets import Dataset

df = pd.read_csv('labeled_complaints.csv')

# Convert to a Hugging Face Dataset
ds = Dataset.from_pandas(df)
print(ds)

# Create a train/validation/test split
ds = ds.train_test_split(test_size=0.2, seed=42)
print(ds)Convert pandas to a Hugging Face Dataset

Step 3: write a README / data card

---
language:
  - en
license: cc-by-4.0
task_categories:
  - text-classification
task_ids:
  - sentiment-classification
size_categories:
  - n<1K
pretty_name: Tweet Complaints vs Praise
---

# Tweet Complaints vs. Praise

## Description
500 English tweets labeled as complaint, praise, or neither,
collected from public data in 2026.

## Sources
Sampled from cardiffnlp/tweet_eval; relabeled by two annotators.

## Labels
- 0 = complaint
- 1 = praise
- 2 = neither

## Agreement
Cohen's kappa between annotators: 0.78 (substantial)

## Limitations
- English only
- Skewed toward consumer tech topics
- Labels reflect US cultural context; may not transfer

## License
CC-BY-4.0. Please cite Tendril Content Team, 2026.A Hugging Face dataset card

Step 4: push it

from datasets import DatasetDict

# Push to your Hugging Face account
ds.push_to_hub('your-username/tweet-complaints-praise')

# Or save locally first, then upload via git
# ds.save_to_disk('./tweet-complaints-praise')One-line publish

Step 5: verify and share

Visit https://huggingface.co/datasets/your-username/tweet-complaints-praise
Confirm the viewer loads and splits look right
Ensure README renders; fix any YAML errors
Add tags so others can find it
Share the link on relevant communities

Good practices for Hub releases

Use Parquet format (faster than CSV for the viewer)
Keep individual files under 5 GB
Include train/validation/test splits
Version your dataset (v1.0, v2.0) rather than overwriting
Respond to issues and discussions in the community tab
If you discover a problem later, release a corrected version with a changelog

End-of-lesson check

15 questions · take it digitally for instant feedback at tendril.neural-forge.io/learn/quiz/end-data-sharing-huggingface

What is the core idea behind "Sharing Datasets on Hugging Face Hub"?
1. Hugging Face Hub is the GitHub of AI data and models. Uploading a dataset there makes it instantly accessible to millions of practitioners.
2. A/B test results where a minority group reverses the majority trend
3. kernel
4. Voice memos
Which term best describes a foundational idea in "Sharing Datasets on Hugging Face Hub"?
1. dataset card
2. Hugging Face
3. push_to_hub
4. Parquet
A learner studying Sharing Datasets on Hugging Face Hub would need to understand which concept?
1. Hugging Face
2. push_to_hub
3. dataset card
4. Parquet
Which of these is directly relevant to Sharing Datasets on Hugging Face Hub?
1. Hugging Face
2. dataset card
3. Parquet
4. push_to_hub
Which of the following is a key point about Sharing Datasets on Hugging Face Hub?
1. Visit https://huggingface.co/datasets/your-username/tweet-complaints-praise
2. Confirm the viewer loads and splits look right
3. Ensure README renders; fix any YAML errors
4. Add tags so others can find it
Which of these does NOT belong in a discussion of Sharing Datasets on Hugging Face Hub?
1. Ensure README renders; fix any YAML errors
2. Confirm the viewer loads and splits look right
3. A/B test results where a minority group reverses the majority trend
4. Visit https://huggingface.co/datasets/your-username/tweet-complaints-praise
Which statement is accurate regarding Sharing Datasets on Hugging Face Hub?
1. Keep individual files under 5 GB
2. Include train/validation/test splits
3. Use Parquet format (faster than CSV for the viewer)
4. Version your dataset (v1.0, v2.0) rather than overwriting
Which of these does NOT belong in a discussion of Sharing Datasets on Hugging Face Hub?
1. Use Parquet format (faster than CSV for the viewer)
2. Include train/validation/test splits
3. A/B test results where a minority group reverses the majority trend
4. Keep individual files under 5 GB
Which statement accurately describes an aspect of Sharing Datasets on Hugging Face Hub?
1. Hugging Face Hub hosts over 200,000 datasets and over 1 million models as of 2024.
2. A/B test results where a minority group reverses the majority trend
3. kernel
4. Voice memos
What does working with Sharing Datasets on Hugging Face Hub typically involve?
1. A/B test results where a minority group reverses the majority trend
2. The big idea: publishing a dataset on Hugging Face is the 21st-century equivalent of publishing a paper.
3. kernel
4. Voice memos
Which best describes the scope of "Sharing Datasets on Hugging Face Hub"?
1. It is unrelated to foundations workflows
2. It applies only to the opposite beginner tier
3. It focuses on Hugging Face Hub is the GitHub of AI data and models. Uploading a dataset there makes it instantly a
4. It was deprecated in 2024 and no longer relevant
Which section heading best belongs in a lesson about Sharing Datasets on Hugging Face Hub?
1. A/B test results where a minority group reverses the majority trend
2. kernel
3. Voice memos
4. Step 1: install and authenticate
Which section heading best belongs in a lesson about Sharing Datasets on Hugging Face Hub?
1. Step 2: prepare your data
2. A/B test results where a minority group reverses the majority trend
3. kernel
4. Voice memos
Which section heading best belongs in a lesson about Sharing Datasets on Hugging Face Hub?
1. A/B test results where a minority group reverses the majority trend
2. Step 3: write a README / data card
3. kernel
4. Voice memos
Which section heading best belongs in a lesson about Sharing Datasets on Hugging Face Hub?
1. A/B test results where a minority group reverses the majority trend
2. kernel
3. Step 4: push it
4. Voice memos

← Back to interactive lesson