Creating Your First Small Labeled Dataset

Section 1

Small, Careful, and Documented

Sourcing 500 raw examples

python

import pandas as pd

# Option A: use an existing Twitter-style dataset from Hugging Face
from datasets import load_dataset
raw = load_dataset('cardiffnlp/tweet_eval', 'sentiment', split='train[:500]')
df = raw.to_pandas()[['text']]

# Option B: collect your own via API (with the platform's consent)
# ... tweepy, praw for reddit, etc.

df.to_csv('to_label.csv', index=False)

A short but complete labeling guide

text

LABELING GUIDE v1.0
=============================================

Label one of: complaint | praise | neither

COMPLAINT: The post expresses negative feelings
 about a product, service, or experience.
  e.g., My phone just died for the third time today.

PRAISE: The post expresses positive feelings about
 a product, service, or experience.
  e.g., This new update is amazing, everything
       is so much faster now!

NEITHER: Anything else — questions, statements of
 fact, off-topic, jokes without sentiment.
  e.g., Does anyone know if iOS 17 supports this?

EDGE CASES:
 - Sarcastic praise that is really complaint -> complaint
 - Complaint about a product phrased politely -> complaint
 - Mixed (some good some bad) -> choose dominant; break tie with neither
 - Non-English -> neither (we are only labeling English for now)

Two annotators + agreement check

python

# After labeling, measure agreement
import pandas as pd
from sklearn.metrics import cohen_kappa_score

df = pd.read_csv('labeled.csv')  # has columns: text, anno_a, anno_b
kappa = cohen_kappa_score(df['anno_a'], df['anno_b'])
print(f'Cohen kappa: {kappa:.3f}')

# Resolve disagreements by discussion, not by averaging
disagreements = df[df['anno_a'] != df['anno_b']]
print(f'{len(disagreements)} disagreements to resolve')

Key terms in this lesson

Creating Your First Small Labeled Dataset

Small, Careful, and Documented

Project: classify tweets as complaint vs. praise

Step 1: precise definition

Step 2: collect raw examples

Step 3: the labeling guideline

Step 4 & 5: label with two annotators

Step 6: release with a data card

Curious about “Creating Your First Small Labeled Dataset”?

Keep going

Creating Your First Small Labeled Dataset

Small, Careful, and Documented

Project: classify tweets as complaint vs. praise

Step 1: precise definition

Step 2: collect raw examples

Step 3: the labeling guideline

Step 4 & 5: label with two annotators

Step 6: release with a data card

Curious about “Creating Your First Small Labeled Dataset”?

Keep going