Tendril

Prompting0%

Lesson 960 of 2116

Few-Shot Example Curation: Quality, Rotation, and Counter-Examples, Part 1

Chain-of-thought prompts show real performance gains on reasoning tasks — and zero benefit on tasks that don't need reasoning. Here's how to tell which is which.

CreatorsPrompting~24 min readBI2 · Representation & ReasoningBI3 · LearningBI4 · Natural InteractionPrint / PDF

Lesson map

What this lesson covers

40 min78 blocks24 concepts

Learning path

The main moves in order

1The premise
2Curating Few-Shot Examples for an LLM Prompt — Quality vs. Quantity
3The premise
4Using Negative Examples in LLM Prompts — When 'Don't Do This' Helps

Concept cluster

Terms to connect while reading

chain-of-thoughtreasoningfew-shottest-time computestep-by-stepchain of thought

Sections27

Lists14

Notes28

Terms2

Section 1

The premise

Chain-of-thought is not a universal upgrade; it helps on reasoning-bound tasks and is overhead everywhere else.

What AI does well here

Use CoT on tasks requiring multi-step reasoning (math, complex logic, multi-constraint problems)
Use few-shot CoT examples on reasoning tasks where the structure of reasoning matters
Hide CoT from end-user output when the reasoning isn't user-facing value
Evaluate with and without CoT to confirm benefit on YOUR task

Check-in 1. Got it so far?

What AI cannot do

Make non-reasoning tasks better with CoT (it just adds tokens)
Make CoT a substitute for fine-tuning on hard reasoning tasks
Trust the reasoning trace as ground truth (models can produce plausible-but-wrong reasoning)

Key terms in this lesson

Check-in 2. Got it so far?

Section 2

Curating Few-Shot Examples for an LLM Prompt — Quality vs. Quantity

Section 3

The premise

Few-shot examples teach the model your edge cases and your style — pick them like you're picking a teaching set, not filler.

Check-in 3. Got it so far?

What AI does well here

Cover the three most common input shapes you actually see
Include one tricky edge case your last release got wrong
Match the exact output format you expect, character for character
Surface the reasoning step if you want the model to externalize one

What AI cannot do

Generalize to a regime far outside the chosen examples
Replace a clear instruction — examples and instructions reinforce each other
Stay relevant when your data distribution shifts

Check-in 4. Got it so far?

Check-in 5. Got it so far?

Section 4

Using Negative Examples in LLM Prompts — When 'Don't Do This' Helps

Section 5

The premise

Showing the model an explicit wrong answer alongside the right one prevents specific failure modes — but can also seed them if done sloppily.

What AI does well here

Pair every negative example with the corrected version
Label them clearly: 'BAD' and 'GOOD' tagged blocks
Use sparingly — one or two negatives, not ten
Reserve them for failures you have actually observed

Check-in 6. Got it so far?

What AI cannot do

Substitute for clear positive examples
Stop the model from repeating the bad pattern in adjacent contexts
Generalize 'don't do X' beyond the literal example

Check-in 7. Got it so far?

Section 6

Rotating Few-Shot Examples to Prevent Overfitting

Section 7

The premise

Maintain an example pool larger than what fits in the prompt, and sample N examples per call with a stable hash for reproducibility.

What AI does well here

Reduce mimicry of one phrasing
Surface examples evenly over time
Detect example-set bugs faster

Check-in 8. Got it so far?

What AI cannot do

Replace evaluation against held-out cases
Compensate for biased pool composition
Guarantee any single output's quality

Check-in 9. Got it so far?

Section 8

Prompting AI: few-shot examples that actually transfer

Section 9

The premise

Few-shot examples teach the model your output shape and your edge-case handling. Examples that all look alike teach only the easy case; the model fails on anything off-distribution.

Check-in 10. Got it so far?

What AI does well here

Match the format of provided examples in new outputs
Generalize patterns shown across diverse examples
Handle cases similar to ones you demonstrated

What AI cannot do

Generalize from examples that all look the same
Recover gracefully from a case unlike any example
Tell you when an example was a poor choice

Check-in 11. Got it so far?

Check-in 12. Got it so far?

Section 10

AI Prompting: Choose Few-Shot vs Fine-Tune Without Burning a Quarter

Section 11

The premise

Teams over-invest in fine-tuning when 5-10 strong few-shot examples would solve the task; they also avoid fine-tuning when the cost arithmetic actually favors it.

What AI does well here

Score whether the task is style or knowledge
Estimate prompt-token cost with examples included
Compare against fine-tune training and inference cost
Recommend evals to compare both

Check-in 13. Got it so far?

What AI cannot do

Account for hidden ops cost of maintaining a fine-tune
Predict whether the model provider will release a better base model
Replace a real eval comparison

Check-in 14. Got it so far?

Section 12

AI and few-shot example selection

Section 13

The premise

Few-shot examples teach the model the shape of the answer. Choosing diverse, edge-leaning examples beats stacking similar ones.

What AI does well here

Suggest covering edge cases in examples.
Help format examples consistently.
Spot when examples contradict each other.

Check-in 15. Got it so far?

What AI cannot do

Know which examples your model will weight most.
Replace systematic eval.
Guarantee a behavior change from one swap.

Check-in 16. Got it so far?

Key terms in this lesson

End-of-lesson quiz

Check what stuck

15 questions · Score saves to your progress.

Tutor

Curious about “Few-Shot Example Curation: Quality, Rotation, and Counter-Examples, Part 1”?

Ask anything about this lesson. I’ll answer using just what you’re reading — short, friendly, grounded.

Progress saved locally in this browser. Sign in to sync across devices.

Related lessons

Keep going

Prompting0%

Lesson 960 of 2116

Few-Shot Example Curation: Quality, Rotation, and Counter-Examples, Part 1

Chain-of-thought prompts show real performance gains on reasoning tasks — and zero benefit on tasks that don't need reasoning. Here's how to tell which is which.

CreatorsPrompting~24 min readBI2 · Representation & ReasoningBI3 · LearningBI4 · Natural InteractionPrint / PDF

Lesson map

What this lesson covers

40 min78 blocks24 concepts

Learning path

The main moves in order

1The premise
2Curating Few-Shot Examples for an LLM Prompt — Quality vs. Quantity
3The premise
4Using Negative Examples in LLM Prompts — When 'Don't Do This' Helps

Concept cluster

Terms to connect while reading

chain-of-thoughtreasoningfew-shottest-time computestep-by-stepchain of thought

Sections27

Lists14

Notes28

Terms2

Section 1

The premise

Chain-of-thought is not a universal upgrade; it helps on reasoning-bound tasks and is overhead everywhere else.

What AI does well here

Use CoT on tasks requiring multi-step reasoning (math, complex logic, multi-constraint problems)
Use few-shot CoT examples on reasoning tasks where the structure of reasoning matters
Hide CoT from end-user output when the reasoning isn't user-facing value
Evaluate with and without CoT to confirm benefit on YOUR task

Check-in 1. Got it so far?

What AI cannot do

Make non-reasoning tasks better with CoT (it just adds tokens)
Make CoT a substitute for fine-tuning on hard reasoning tasks
Trust the reasoning trace as ground truth (models can produce plausible-but-wrong reasoning)

Key terms in this lesson

Check-in 2. Got it so far?

Section 2

Curating Few-Shot Examples for an LLM Prompt — Quality vs. Quantity

Section 3

The premise

Few-shot examples teach the model your edge cases and your style — pick them like you're picking a teaching set, not filler.

Check-in 3. Got it so far?

What AI does well here

Cover the three most common input shapes you actually see
Include one tricky edge case your last release got wrong
Match the exact output format you expect, character for character
Surface the reasoning step if you want the model to externalize one

What AI cannot do

Generalize to a regime far outside the chosen examples
Replace a clear instruction — examples and instructions reinforce each other
Stay relevant when your data distribution shifts

Check-in 4. Got it so far?

Check-in 5. Got it so far?

Section 4

Using Negative Examples in LLM Prompts — When 'Don't Do This' Helps

Section 5

The premise

Showing the model an explicit wrong answer alongside the right one prevents specific failure modes — but can also seed them if done sloppily.

What AI does well here

Pair every negative example with the corrected version
Label them clearly: 'BAD' and 'GOOD' tagged blocks
Use sparingly — one or two negatives, not ten
Reserve them for failures you have actually observed

Check-in 6. Got it so far?

What AI cannot do

Substitute for clear positive examples
Stop the model from repeating the bad pattern in adjacent contexts
Generalize 'don't do X' beyond the literal example

Check-in 7. Got it so far?

Section 6

Rotating Few-Shot Examples to Prevent Overfitting

Section 7

The premise

Maintain an example pool larger than what fits in the prompt, and sample N examples per call with a stable hash for reproducibility.

What AI does well here

Reduce mimicry of one phrasing
Surface examples evenly over time
Detect example-set bugs faster

Check-in 8. Got it so far?

What AI cannot do

Replace evaluation against held-out cases
Compensate for biased pool composition
Guarantee any single output's quality

Check-in 9. Got it so far?

Section 8

Prompting AI: few-shot examples that actually transfer

Section 9

The premise

Few-shot examples teach the model your output shape and your edge-case handling. Examples that all look alike teach only the easy case; the model fails on anything off-distribution.

Check-in 10. Got it so far?

What AI does well here

Match the format of provided examples in new outputs
Generalize patterns shown across diverse examples
Handle cases similar to ones you demonstrated

What AI cannot do

Generalize from examples that all look the same
Recover gracefully from a case unlike any example
Tell you when an example was a poor choice

Check-in 11. Got it so far?

Check-in 12. Got it so far?

Section 10

AI Prompting: Choose Few-Shot vs Fine-Tune Without Burning a Quarter

Section 11

The premise

Teams over-invest in fine-tuning when 5-10 strong few-shot examples would solve the task; they also avoid fine-tuning when the cost arithmetic actually favors it.

What AI does well here

Score whether the task is style or knowledge
Estimate prompt-token cost with examples included
Compare against fine-tune training and inference cost
Recommend evals to compare both

Check-in 13. Got it so far?

What AI cannot do

Account for hidden ops cost of maintaining a fine-tune
Predict whether the model provider will release a better base model
Replace a real eval comparison

Check-in 14. Got it so far?

Section 12

AI and few-shot example selection

Section 13

The premise

Few-shot examples teach the model the shape of the answer. Choosing diverse, edge-leaning examples beats stacking similar ones.

What AI does well here

Suggest covering edge cases in examples.
Help format examples consistently.
Spot when examples contradict each other.

Check-in 15. Got it so far?

What AI cannot do

Know which examples your model will weight most.
Replace systematic eval.
Guarantee a behavior change from one swap.

Check-in 16. Got it so far?

Key terms in this lesson

End-of-lesson quiz

Check what stuck

15 questions · Score saves to your progress.

Tutor

Curious about “Few-Shot Example Curation: Quality, Rotation, and Counter-Examples, Part 1”?

Ask anything about this lesson. I’ll answer using just what you’re reading — short, friendly, grounded.

Progress saved locally in this browser. Sign in to sync across devices.

Related lessons