Tendril — AI Lessons for Real Life

Tendril

The premise

There is no single best model — there is a frontier of capability, cost, and latency tradeoffs. The skill is matching task to model and revisiting that choice as the frontier moves.

What AI does well here

Mapping tasks by complexity, latency budget, and cost sensitivity

Using small fast models for classification and extraction

Reserving frontier models for reasoning, coding, and judgment-heavy tasks

Re-evaluating choices as new model versions ship

What AI cannot do

Pick once and be done — the frontier moves every few months

Trust benchmark scores blindly — they often diverge from your task

Avoid the work of running your own evals across candidate models

End-of-lesson check

15 questions · take it digitally for instant feedback at tendril.neural-forge.io/learn/quiz/end-ai-foundations-model-selection-final1-creators

Is there a single 'best' model?

Yes, GPT-5 always wins
No — there's a frontier of capability, cost, and latency tradeoffs
Yes, the cheapest one always
Yes, the slowest one always

Which task fits a small fast model?

Open-ended legal reasoning
Multi-step research synthesis
Classification and extraction
Original creative writing

Which task warrants a frontier model?

Spam classification
Trivial extraction
Splitting a string
Reasoning, coding, and judgment-heavy work

How should you re-evaluate model choices over time?

Re-run your evals as new model versions ship
Pick once and forget
Switch every day
Switch only on holidays

What is recommended when comparing three tasks and three models?

Pick by gut
Run each task through each model on 10 examples and score quality, latency, cost
Pick the most expensive
Pick by influencer recommendation

Why distrust public benchmark scores blindly?

Benchmarks are illegal
Benchmarks are random
Benchmarks often diverge from your specific task
Benchmarks change daily

How often does the frontier 'move'?

Every five years
Never
Every decade
Roughly every few months

What is 'model routing'?

Sending each request to the best-fit model for that task
Picking a router brand
A type of cable
A networking standard

Which is true about cost sensitivity?

All tasks demand the cheapest
Some tasks tolerate latency or quality drops to save cost
All tasks demand the most expensive
Cost is irrelevant

Why do quality scores often differ from latency rankings?

Latency and quality are identical
Latency is random
Bigger models are generally smarter but slower
Latency is always best on the largest model

Which is the right frame for 'frontier vs efficient' models?

Use only frontier
Use only efficient
Pick at random
Use efficient models where they're enough; reserve frontier for hard cases

A model wins MMLU. What does that mean for your specific extraction task?

Little — run your own eval before committing
Everything — adopt immediately
Nothing — never use it
It's banned

What is the role of latency budget in model choice?

Sets the maximum cost
Defines the upper bound on response time you'll accept
Sets training time
Sets the model name

Why is 'representative sample' wording important in evals?

Sample size is irrelevant
Random data is best
Eval results only generalize if the sample reflects real production data
Toy examples suffice

Which mindset best fits model selection?

Pick the brand you like
Stick with one model forever
Switch every release
Match task to model and re-check as the frontier moves

The premise

There is no single best model — there is a frontier of capability, cost, and latency tradeoffs. The skill is matching task to model and revisiting that choice as the frontier moves.

What AI does well here

Mapping tasks by complexity, latency budget, and cost sensitivity

Using small fast models for classification and extraction

Reserving frontier models for reasoning, coding, and judgment-heavy tasks

Re-evaluating choices as new model versions ship

What AI cannot do

Pick once and be done — the frontier moves every few months

Trust benchmark scores blindly — they often diverge from your task

Avoid the work of running your own evals across candidate models

End-of-lesson check

15 questions · take it digitally for instant feedback at tendril.neural-forge.io/learn/quiz/end-ai-foundations-model-selection-final1-creators

Is there a single 'best' model?

Yes, GPT-5 always wins
No — there's a frontier of capability, cost, and latency tradeoffs
Yes, the cheapest one always
Yes, the slowest one always

Which task fits a small fast model?

Open-ended legal reasoning
Multi-step research synthesis
Classification and extraction
Original creative writing

Which task warrants a frontier model?

Spam classification
Trivial extraction
Splitting a string
Reasoning, coding, and judgment-heavy work

How should you re-evaluate model choices over time?

Re-run your evals as new model versions ship
Pick once and forget
Switch every day
Switch only on holidays

What is recommended when comparing three tasks and three models?

Pick by gut
Run each task through each model on 10 examples and score quality, latency, cost
Pick the most expensive
Pick by influencer recommendation

Why distrust public benchmark scores blindly?

Benchmarks are illegal
Benchmarks are random
Benchmarks often diverge from your specific task
Benchmarks change daily

How often does the frontier 'move'?

Every five years
Never
Every decade
Roughly every few months

What is 'model routing'?

Sending each request to the best-fit model for that task
Picking a router brand
A type of cable
A networking standard

Which is true about cost sensitivity?

All tasks demand the cheapest
Some tasks tolerate latency or quality drops to save cost
All tasks demand the most expensive
Cost is irrelevant

Why do quality scores often differ from latency rankings?

Latency and quality are identical
Latency is random
Bigger models are generally smarter but slower
Latency is always best on the largest model

Which is the right frame for 'frontier vs efficient' models?

Use only frontier
Use only efficient
Pick at random
Use efficient models where they're enough; reserve frontier for hard cases

A model wins MMLU. What does that mean for your specific extraction task?

Little — run your own eval before committing
Everything — adopt immediately
Nothing — never use it
It's banned

What is the role of latency budget in model choice?

Sets the maximum cost
Defines the upper bound on response time you'll accept
Sets training time
Sets the model name

Why is 'representative sample' wording important in evals?

Sample size is irrelevant
Random data is best
Eval results only generalize if the sample reflects real production data
Toy examples suffice

Which mindset best fits model selection?

Pick the brand you like
Stick with one model forever
Switch every release
Match task to model and re-check as the frontier moves

Choosing Between AI Models: Capability, Cost, Latency

The premise

What AI does well here

What AI cannot do

End-of-lesson check

Choosing Between AI Models: Capability, Cost, Latency

The premise

What AI does well here

What AI cannot do

End-of-lesson check