Tendril

Lesson 93 of 2244

Qualitative Coding With AI: Inter-Rater Reliability Still Matters

AI can tag interview transcripts at 1000x human speed. That speed is worthless without validation. Here's the honest workflow.

Adults & Professionals · Research & Analysis · ~24 min read · Interactive

Print / PDF

The tempting shortcut and its problem

Qualitative coding is the slow heart of interview research: read 40 transcripts, tag every meaningful passage, let themes emerge. LLMs can do a first pass in minutes. But 'a first pass' is not a finished analysis. Treating the first pass as the final product is how AI-assisted research gets rejected at peer review.

The defensible workflow

1Have the LLM propose an initial codebook from 3-5 transcripts
2Human researchers refine the codebook, collapsing redundant codes and naming them carefully
3LLM codes all transcripts using the refined codebook
4Humans independently re-code a random 15-20% sample
5Calculate inter-rater reliability (Cohen's kappa) between human and LLM
6If kappa < 0.7, revise the codebook and re-run

What to disclose in the methods section

Which model did the coding (including version and date)
The exact prompt template used for coding
The codebook (as a supplementary appendix)
The human-AI agreement statistics
Any cases where humans overrode the LLM's codes

Key terms in this lesson

The big idea: AI accelerates qualitative coding — it does not replace the validation work. Kappa statistics and disclosures are non-negotiable.

End-of-lesson quiz

Check what stuck

13 questions · Score saves to your progress.

Tutor

Curious about “Qualitative Coding With AI: Inter-Rater Reliability Still Matters”?

Ask anything about this lesson. I’ll answer using just what you’re reading — short, friendly, grounded.

Progress saved locally in this browser. Sign in to sync across devices.

Related lessons

Qualitative Coding With AI: Inter-Rater Reliability Still Matters

The tempting shortcut and its problem

The defensible workflow

What to disclose in the methods section

Curious about “Qualitative Coding With AI: Inter-Rater Reliability Still Matters”?

Keep going

Qualitative Coding With AI: Inter-Rater Reliability Still Matters

The tempting shortcut and its problem

The defensible workflow

What to disclose in the methods section

Curious about “Qualitative Coding With AI: Inter-Rater Reliability Still Matters”?

Keep going