Tendril · Adults & Professionals · Research & Analysis
Qualitative Coding With AI: Inter-Rater Reliability Still Matters
AI can tag interview transcripts at 1000x human speed. That speed is worthless without validation. Here's the honest workflow.
40 min · Reviewed 2026
The tempting shortcut and its problem
Qualitative coding is the slow heart of interview research: read 40 transcripts, tag every meaningful passage, let themes emerge. LLMs can do a first pass in minutes. But 'a first pass' is not a finished analysis. Treating the first pass as the final product is how AI-assisted research gets rejected at peer review.
The defensible workflow
Have the LLM propose an initial codebook from 3-5 transcripts
Human researchers refine the codebook, collapsing redundant codes and naming them carefully
LLM codes all transcripts using the refined codebook
Humans independently re-code a random 15-20% sample
Calculate inter-rater reliability (Cohen's kappa) between human and LLM
If kappa < 0.7, revise the codebook and re-run
What to disclose in the methods section
Which model did the coding (including version and date)
The exact prompt template used for coding
The codebook (as a supplementary appendix)
The human-AI agreement statistics
Any cases where humans overrode the LLM's codes
The big idea: AI accelerates qualitative coding — it does not replace the validation work. Kappa statistics and disclosures are non-negotiable.
End-of-lesson check
13 questions · take it digitally for instant feedback at tendril.neural-forge.io/learn/quiz/end-research-qualitative-coding-with-ai-creators
What is the main takeaway from "Qualitative Coding With AI: Inter-Rater Reliability Still Matters — Quick Check"?
AI can tag interview transcripts at 1000x human speed. That speed is worthless without validation. Here's the honest workflow.
Substitute AI for substantive psychological theory
Substitute for the IRB privacy review
Soften defensive language without conceding the point.
Which choice best fits the situation in "Qualitative Coding With AI: Inter-Rater Reliability Still Matters — Quick Check"?
thematic analysis
codebook
Cohen's kappa
inter-rater reliability
A learner studying Qualitative Coding With AI: Inter-Rater Reliability Still Matters would need to understand which concept?
codebook
Cohen's kappa
thematic analysis
inter-rater reliability
Which of these is directly relevant to Qualitative Coding With AI: Inter-Rater Reliability Still Matters?
codebook
thematic analysis
inter-rater reliability
Cohen's kappa
Which of the following is a key point about Qualitative Coding With AI: Inter-Rater Reliability Still Matters?
Have the LLM propose an initial codebook from 3-5 transcripts
Human researchers refine the codebook, collapsing redundant codes and naming them carefully
LLM codes all transcripts using the refined codebook
Humans independently re-code a random 15-20% sample
Which of these does NOT belong in a discussion of Qualitative Coding With AI: Inter-Rater Reliability Still Matters?
LLM codes all transcripts using the refined codebook
Have the LLM propose an initial codebook from 3-5 transcripts
Substitute AI for substantive psychological theory
Human researchers refine the codebook, collapsing redundant codes and naming them carefully
Which statement best matches the lesson "Qualitative Coding With AI: Inter-Rater Reliability Still Matters — Quick Check"?
The exact prompt template used for coding
The codebook (as a supplementary appendix)
Which model did the coding (including version and date)
The human-AI agreement statistics
Which of these does NOT belong in a discussion of Qualitative Coding With AI: Inter-Rater Reliability Still Matters?
The codebook (as a supplementary appendix)
Which model did the coding (including version and date)
The exact prompt template used for coding
Substitute AI for substantive psychological theory
What is the key insight about "The kappa threshold" in the context of Qualitative Coding With AI: Inter-Rater Reliability Still Matters?
Cohen's kappa above 0.7 is typically considered substantial agreement; above 0.8 is strong.
Substitute AI for substantive psychological theory
Substitute for the IRB privacy review
Soften defensive language without conceding the point.
What is the key insight about "Reviewers increasingly ask these questions" in the context of Qualitative Coding With AI: Inter-Rater Reliability Still Matters?
Substitute AI for substantive psychological theory
Journals now routinely require AI-assistance disclosures. A methods section that hides the LLM's role will trigger revis…
Substitute for the IRB privacy review
Soften defensive language without conceding the point.
What is the key warning about "Maintain methodological rigour" in the context of Qualitative Coding With AI: Inter-Rater Reliability Still Matters?
Substitute AI for substantive psychological theory
Substitute for the IRB privacy review
AI-assisted research requires transparent disclosure of tools used, validation of outputs against primary sources, and p…
Soften defensive language without conceding the point.
Which statement accurately describes an aspect of Qualitative Coding With AI: Inter-Rater Reliability Still Matters?
Substitute AI for substantive psychological theory
Substitute for the IRB privacy review
Soften defensive language without conceding the point.
Qualitative coding is the slow heart of interview research: read 40 transcripts, tag every meaningful passage, let themes emerge.
What does working with Qualitative Coding With AI: Inter-Rater Reliability Still Matters typically involve?
The big idea: AI accelerates qualitative coding — it does not replace the validation work.
Substitute AI for substantive psychological theory
Substitute for the IRB privacy review
Soften defensive language without conceding the point.