The premise
Prompt injection is the AI equivalent of SQL injection; trust boundaries must be explicit and tools must be isolated.
What AI does well here
- Map every untrusted input source reaching your prompts.
- Draft tool-call validation rules with named owners.
What AI cannot do
- Eliminate prompt injection risk entirely.
- Replace human review for high-impact tool calls.
End-of-lesson check
15 questions · take it digitally for instant feedback at tendril.neural-forge.io/learn/quiz/end-creators-prompt-injection-fundamentals
In the context of AI security, what is prompt injection most analogous to?
- Cross-site scripting, where malicious code runs in a user's browser
- Denial of service, where service availability is disrupted
- SQL injection, where untrusted input manipulates system behavior
- Buffer overflow, where memory boundaries are exceeded
Which statement best describes a trust boundary in an AI agent system?
- An encryption layer protecting data at rest
- A physical barrier preventing unauthorized hardware access
- The dividing line between trusted system prompts and untrusted external inputs
- A validation checkpoint that only runs on weekends
In the three-layer trust model presented, how is user input categorized?
- Untrusted, because users cannot be malicious
- Semi-trusted, because it requires validation before execution
- Fully trusted, because the user owns the system
- Ignored, because it is always overwritten by system prompts
According to the concepts covered, which capability is beyond what AI can currently achieve regarding prompt injection?
- Identifying repeated attack attempts
- Filtering known injection patterns
- Eliminating prompt injection risk entirely
- Detecting obvious malicious keywords in prompts
What is 'indirect injection' in the context of prompt injection attacks?
- When two system prompts conflict with each other
- When external data processed by the AI contains hidden malicious instructions
- When an AI model generates harmful output on its own
- When a user directly types malicious commands into a prompt
What is the primary purpose of tool isolation in agent systems?
- To simplify the user interface
- To prevent a compromised tool from accessing other system components
- To reduce the cost of running AI models
- To make the system run faster
What does the lesson say about relying on user confirmation to prevent prompt injection damage?
- User confirmation is the most effective defense
- Design for safety, not for blame transfer via confirmations
- User confirmation eliminates all liability
- Users should confirm every action the AI takes
When mapping trust boundaries for an agent, what should be specified for each tool?
- The number of lines in its documentation
- The programming language it was written in
- What input layer can trigger it and the validation gate before execution
- The tool's source code complexity
What type of content should be treated as untrusted in an AI agent system?
- Any external content reaching the model
- Only content stored in databases
- Only content from unknown IP addresses
- Only content typed directly by users
Why is output validation important in agent systems?
- To make the AI respond faster
- To improve the aesthetic quality of responses
- To ensure the AI doesn't generate harmful or unintended tool calls
- To reduce the cost of API calls
In the trust model, which layer should be considered fully trusted?
- System prompts
- Third-party plugin outputs
- External API responses
- User input
What should trigger validation gates in an agent system?
- Only malformed inputs
- Only inputs from new users
- Any input crossing a trust boundary
- Only inputs over 1000 characters
Why cannot AI completely solve the prompt injection problem?
- Because AI models are not powerful enough
- Because governments have banned AI security research
- Because distinguishing malicious instructions from legitimate requests is fundamentally difficult
- Because prompt injection doesn't really exist
What does the lesson recommend regarding tool-call validation rules?
- They should be changed daily
- They should have named owners responsible for them
- They should only be written for critical tools
- They should be created without any specific owner
In a prompt injection attack, what makes indirect injection particularly dangerous?
- It requires the user to type special characters
- It requires physical access to the server
- The malicious instructions come from a source the user trusts but cannot verify
- It only works on weekends