Tendril — AI Lessons for Real Life

Tendril

How Agents Go Wrong

Agents fail in funny and scary ways — booking the wrong flight, sending wrong emails, running up bills.

Most failures come from misunderstanding the goal — agents are great at following directions, bad at noticing when they are heading off course.

Three common failure modes

Misinterpreting ambiguous instructions

Continuing too long without checking in

Using the wrong tool for a task

The big idea: Agents fail in ways that are funny in retrospect — but only if you caught them in time.

End-of-lesson check

15 questions · take it digitally for instant feedback at tendril.neural-forge.io/learn/quiz/end-builders-agentic-agent-failures

Which sentence best captures the main idea of 'How Agents Go Wrong'?

Agents should always run without limits or oversight
Agents and chatbots are the same thing in every way
Tools and goals are unnecessary for agent design
Agents fail in funny and scary ways — booking the wrong flight, sending wrong emails, running up bills..

Which of the following is part of 'A real failure'?

Always run with no oversight
Disable safety checks for speed
Hide tool calls from the operator
You: 'Email Mom about the party.' Agent emails 'Mom' from your contacts — your roommate's mom, also called Mom in your phone.

Which of the following is part of 'Three common failure modes'?

Never log what the agent did
Disable safety checks for speed
Misinterpreting ambiguous instructions
Hide tool calls from the operator

Which of the following is part of 'Review date'?

Skip every form of evaluation
Run unbounded retries on any error
Reviewed in 2026. Treat fast-changing product names, prices, availability, and policy details as examples to verify before use.
Avoid taking any actions in the world

What is 'ambiguity' in this context?

A trick to bypass approvals
A way to disable the agent's tools
A reason to skip all logging
A core concept covered in How Agents Go Wrong

What is 'overconfidence' in this context?

A core concept covered in How Agents Go Wrong
A way to disable the agent's tools
A reason to skip all logging
A trick to bypass approvals

What is 'guardrails' in this context?

A trick to bypass approvals
A reason to skip all logging
A way to disable the agent's tools
A core concept covered in How Agents Go Wrong

Which is the most informative thing to look at after an agent failure?

Just the user message
The model's release notes
The full trace: prompts, tool calls, returned errors, and decisions
Only the final output

Which budget control most directly prevents runaway costs from an agent loop?

A bigger model
A friendly system prompt
A longer context window
A hard cap on steps, tokens, or dollars per task

Why does an AI agent need 'tools' such as a browser, calendar, or code runner?

Tools shrink the context window
Tools let the agent take actions in the world instead of only producing text
Tools replace the need for any prompts
Tools make the model speak more naturally

What is the best response when an agent suggests an action you do not understand?

Reject everything and stop using the agent
Ask the agent to explain the action and its expected effect before approving
Approve it to keep things moving
Run it twice to be sure

Why does a multi-agent system sometimes outperform a single agent on complex jobs?

Multiple agents always cost less
Specialized roles can divide work and check each other
More agents always means more accuracy
Single agents cannot use tools

Why is logging every tool call an agent makes a baseline requirement?

Logs are needed to debug, audit, and explain agent behavior to users
Logs replace the need for testing
Logs are only for legal teams
Logs make the model run faster

What does an 'eval' for an agent measure?

The exact wording of every prompt
The temperature setting
Whether the agent reliably completes a defined task end to end
How polite the model sounds

Why are clear success criteria critical when building an agent?

They make the agent sound smarter
Without them you cannot tell whether the agent worked or guess
They are required by law
They reduce the number of tokens used

How Agents Go Wrong

Agents fail in funny and scary ways — booking the wrong flight, sending wrong emails, running up bills.

Most failures come from misunderstanding the goal — agents are great at following directions, bad at noticing when they are heading off course.

Three common failure modes

Misinterpreting ambiguous instructions

Continuing too long without checking in

Using the wrong tool for a task

The big idea: Agents fail in ways that are funny in retrospect — but only if you caught them in time.

End-of-lesson check

15 questions · take it digitally for instant feedback at tendril.neural-forge.io/learn/quiz/end-builders-agentic-agent-failures

Which sentence best captures the main idea of 'How Agents Go Wrong'?

Agents should always run without limits or oversight
Agents and chatbots are the same thing in every way
Tools and goals are unnecessary for agent design
Agents fail in funny and scary ways — booking the wrong flight, sending wrong emails, running up bills..

Which of the following is part of 'A real failure'?

Always run with no oversight
Disable safety checks for speed
Hide tool calls from the operator
You: 'Email Mom about the party.' Agent emails 'Mom' from your contacts — your roommate's mom, also called Mom in your phone.

Which of the following is part of 'Three common failure modes'?

Never log what the agent did
Disable safety checks for speed
Misinterpreting ambiguous instructions
Hide tool calls from the operator

Which of the following is part of 'Review date'?

Skip every form of evaluation
Run unbounded retries on any error
Reviewed in 2026. Treat fast-changing product names, prices, availability, and policy details as examples to verify before use.
Avoid taking any actions in the world

What is 'ambiguity' in this context?

A trick to bypass approvals
A way to disable the agent's tools
A reason to skip all logging
A core concept covered in How Agents Go Wrong

What is 'overconfidence' in this context?

A core concept covered in How Agents Go Wrong
A way to disable the agent's tools
A reason to skip all logging
A trick to bypass approvals

What is 'guardrails' in this context?

A trick to bypass approvals
A reason to skip all logging
A way to disable the agent's tools
A core concept covered in How Agents Go Wrong

Which is the most informative thing to look at after an agent failure?

Just the user message
The model's release notes
The full trace: prompts, tool calls, returned errors, and decisions
Only the final output

Which budget control most directly prevents runaway costs from an agent loop?

A bigger model
A friendly system prompt
A longer context window
A hard cap on steps, tokens, or dollars per task

Why does an AI agent need 'tools' such as a browser, calendar, or code runner?

Tools shrink the context window
Tools let the agent take actions in the world instead of only producing text
Tools replace the need for any prompts
Tools make the model speak more naturally

What is the best response when an agent suggests an action you do not understand?

Reject everything and stop using the agent
Ask the agent to explain the action and its expected effect before approving
Approve it to keep things moving
Run it twice to be sure

Why does a multi-agent system sometimes outperform a single agent on complex jobs?

Multiple agents always cost less
Specialized roles can divide work and check each other
More agents always means more accuracy
Single agents cannot use tools

Why is logging every tool call an agent makes a baseline requirement?

Logs are needed to debug, audit, and explain agent behavior to users
Logs replace the need for testing
Logs are only for legal teams
Logs make the model run faster

What does an 'eval' for an agent measure?

The exact wording of every prompt
The temperature setting
Whether the agent reliably completes a defined task end to end
How polite the model sounds

Why are clear success criteria critical when building an agent?

They make the agent sound smarter
Without them you cannot tell whether the agent worked or guess
They are required by law
They reduce the number of tokens used