Loading lesson…
Hugging Face Hub is the GitHub of AI data and models. Uploading a dataset there makes it instantly accessible to millions of practitioners.
Hugging Face Hub hosts over 200,000 datasets and over 1 million models as of 2024. Uploading your dataset there gives it citation, versioning, a built-in viewer, and instant programmatic access from any project using the datasets library. It is free for public datasets.
pip install huggingface_hub datasets # Log in (grab a token from https://huggingface.co/settings/tokens) huggingface-cli loginOne-time setupimport pandas as pd from datasets import Dataset df = pd.read_csv('labeled_complaints.csv') # Convert to a Hugging Face Dataset ds = Dataset.from_pandas(df) print(ds) # Create a train/validation/test split ds = ds.train_test_split(test_size=0.2, seed=42) print(ds)Convert pandas to a Hugging Face Dataset--- language: - en license: cc-by-4.0 task_categories: - text-classification task_ids: - sentiment-classification size_categories: - n<1K pretty_name: Tweet Complaints vs Praise --- # Tweet Complaints vs. Praise ## Description 500 English tweets labeled as complaint, praise, or neither, collected from public data in 2026. ## Sources Sampled from cardiffnlp/tweet_eval; relabeled by two annotators. ## Labels - 0 = complaint - 1 = praise - 2 = neither ## Agreement Cohen's kappa between annotators: 0.78 (substantial) ## Limitations - English only - Skewed toward consumer tech topics - Labels reflect US cultural context; may not transfer ## License CC-BY-4.0. Please cite Tendril Content Team, 2026.A Hugging Face dataset cardfrom datasets import DatasetDict # Push to your Hugging Face account ds.push_to_hub('your-username/tweet-complaints-praise') # Or save locally first, then upload via git # ds.save_to_disk('./tweet-complaints-praise')One-line publishThe big idea: publishing a dataset on Hugging Face is the 21st-century equivalent of publishing a paper. It is permanent, searchable, usable, and attributable. If you build a dataset, ship it. The community learns when you share.
8 questions · take it digitally for instant feedback at tendril.neural-forge.io/learn/quiz/end-data-sharing-huggingface
What is the main idea of "Sharing Datasets on Hugging Face Hub"?
Which concept is most central to "Sharing Datasets on Hugging Face Hub"?
Which use of AI fits this topic best?
What should a careful learner remember about "Ground your practice in fundamentals"?
You want to use AI after this lesson. What is the safest next step?
How should AI output about Hugging Face be treated?
Name one way to verify an AI answer about Hugging Face.
Which action would help you apply "Sharing Datasets on Hugging Face Hub" responsibly?