Tendril — AI Lessons for Real Life

Tendril

The premise

Each vendor publishes a hierarchy spec, but the actual behavior under conflict varies and matters for security.

What AI does well here

Place trusted instructions in the highest-priority role

Test conflict cases before relying on hierarchy

Use hierarchy to reduce prompt-injection blast radius

What AI cannot do

Prevent all jailbreaks

Trust hierarchy as a sole defense

Predict cross-version changes

End-of-lesson check

15 questions · take it digitally for instant feedback at tendril.neural-forge.io/learn/quiz/end-model-families-AI-and-instruction-hierarchy-creators

In AI model architectures, what does the term 'instruction hierarchy' refer to?

A security protocol that encrypts all model outputs
A ranking system that determines which set of instructions takes precedence when multiple instructions conflict
A technique for measuring model response speed under different loads
A method for organizing user queries by topic in a knowledge base

Which role typically occupies the highest priority level in most AI model instruction hierarchies?

Assistant role
System role
User role
Developer role

Why is it important for developers to test conflict cases with different instruction sources before deploying a model in production?

To discover how the model actually prioritizes conflicting instructions in practice
To verify the model generates longer responses
To measure how much memory the model uses
To ensure the model can process messages faster

What security concept describes the limit of damage when a prompt injection attack succeeds?

Attention span
Blast radius
Gradient descent
Latency window

Based on the lesson, can instruction hierarchy completely prevent all jailbreak attacks?

Yes, but only for models with more than 100 billion parameters
Yes, hierarchy makes jailbreaks mathematically impossible
No, but it eliminates the need for any other security measures
No, it raises the bar for attackers but cannot stop all jailbreaks

How do the major model families (Claude, GPT, Gemini) differ in handling instruction hierarchy conflicts?

Claude and Gemini ignore hierarchy entirely
They all use identical hierarchy implementations
Each vendor implements hierarchy differently, causing varying behavior under conflict
Only GPT publishes hierarchy specifications

What is a prompt injection attack?

A method for making models run faster using code injection
An attempt to manipulate model behavior by embedding malicious instructions in user inputs
A process for training models on additional data
A technique for increasing model context window size

Why should instruction hierarchy not be trusted as the sole defense mechanism?

Because it only works with closed-source models
Because attackers can find ways to bypass hierarchy protections and no single defense is foolproof
Because users can easily override any hierarchy
Because hierarchy slows down model responses too much

A prompt injection attack successfully manipulates a model. What does instruction hierarchy help limit in this scenario?

The number of tokens the model can generate
The cost of running the model
The speed at which the attack spreads to other users
The scope of damage the attack can cause by containing it to lower-priority roles

What is the security benefit of placing trusted instructions in the highest-priority role?

They will execute faster than other instructions
They are stored in a more secure database
Those instructions cannot be overridden by lower-priority conflicting instructions
They become visible to all users automatically

What does the lesson mean by 'cross-version changes' in model behavior?

Variations in model output format
Changes in model pricing across different regions
Differences in training data between versions
Updates to a model that may alter how it handles instruction hierarchy conflicts

What is 'role conflict' in the context of instruction hierarchy?

When users have conflicting preferences about model behavior
When the model cannot decide between multiple valid responses
When two developers submit conflicting code changes
When instructions from different roles (system, developer, user) provide contradictory directions

What does 'tool-side checks' refer to in model security?

Monitoring user behavior patterns
Checking the hardware specifications where models run
Auditing developer access to model APIs
Validating outputs that tools produce before allowing them to be returned to the user

The lesson mentions testing with 20 variants of conflicting instructions. What is the purpose of testing this many variants?

To increase the model's accuracy score
To comprehensively map how the model handles different types of hierarchy conflicts
To reduce the model's memory usage
To train the model to handle more instructions

What improvement does instruction hierarchy provide against prompt injection attacks, even if it cannot prevent them all?

It eliminates the need for any user monitoring
It makes attacks completely invisible to users
It raises the difficulty for attackers and creates a first line of defense
It automatically patches vulnerabilities in real-time

The premise

Each vendor publishes a hierarchy spec, but the actual behavior under conflict varies and matters for security.

What AI does well here

Place trusted instructions in the highest-priority role

Test conflict cases before relying on hierarchy

Use hierarchy to reduce prompt-injection blast radius

What AI cannot do

Prevent all jailbreaks

Trust hierarchy as a sole defense

Predict cross-version changes

End-of-lesson check

15 questions · take it digitally for instant feedback at tendril.neural-forge.io/learn/quiz/end-model-families-AI-and-instruction-hierarchy-creators

In AI model architectures, what does the term 'instruction hierarchy' refer to?

A security protocol that encrypts all model outputs
A ranking system that determines which set of instructions takes precedence when multiple instructions conflict
A technique for measuring model response speed under different loads
A method for organizing user queries by topic in a knowledge base

Which role typically occupies the highest priority level in most AI model instruction hierarchies?

Assistant role
System role
User role
Developer role

Why is it important for developers to test conflict cases with different instruction sources before deploying a model in production?

To discover how the model actually prioritizes conflicting instructions in practice
To verify the model generates longer responses
To measure how much memory the model uses
To ensure the model can process messages faster

What security concept describes the limit of damage when a prompt injection attack succeeds?

Attention span
Blast radius
Gradient descent
Latency window

Based on the lesson, can instruction hierarchy completely prevent all jailbreak attacks?

Yes, but only for models with more than 100 billion parameters
Yes, hierarchy makes jailbreaks mathematically impossible
No, but it eliminates the need for any other security measures
No, it raises the bar for attackers but cannot stop all jailbreaks

How do the major model families (Claude, GPT, Gemini) differ in handling instruction hierarchy conflicts?

Claude and Gemini ignore hierarchy entirely
They all use identical hierarchy implementations
Each vendor implements hierarchy differently, causing varying behavior under conflict
Only GPT publishes hierarchy specifications

What is a prompt injection attack?

A method for making models run faster using code injection
An attempt to manipulate model behavior by embedding malicious instructions in user inputs
A process for training models on additional data
A technique for increasing model context window size

Why should instruction hierarchy not be trusted as the sole defense mechanism?

Because it only works with closed-source models
Because attackers can find ways to bypass hierarchy protections and no single defense is foolproof
Because users can easily override any hierarchy
Because hierarchy slows down model responses too much

A prompt injection attack successfully manipulates a model. What does instruction hierarchy help limit in this scenario?

The number of tokens the model can generate
The cost of running the model
The speed at which the attack spreads to other users
The scope of damage the attack can cause by containing it to lower-priority roles

What is the security benefit of placing trusted instructions in the highest-priority role?

They will execute faster than other instructions
They are stored in a more secure database
Those instructions cannot be overridden by lower-priority conflicting instructions
They become visible to all users automatically

What does the lesson mean by 'cross-version changes' in model behavior?

Variations in model output format
Changes in model pricing across different regions
Differences in training data between versions
Updates to a model that may alter how it handles instruction hierarchy conflicts

What is 'role conflict' in the context of instruction hierarchy?

When users have conflicting preferences about model behavior
When the model cannot decide between multiple valid responses
When two developers submit conflicting code changes
When instructions from different roles (system, developer, user) provide contradictory directions

What does 'tool-side checks' refer to in model security?

Monitoring user behavior patterns
Checking the hardware specifications where models run
Auditing developer access to model APIs
Validating outputs that tools produce before allowing them to be returned to the user

The lesson mentions testing with 20 variants of conflicting instructions. What is the purpose of testing this many variants?

To increase the model's accuracy score
To comprehensively map how the model handles different types of hierarchy conflicts
To reduce the model's memory usage
To train the model to handle more instructions

What improvement does instruction hierarchy provide against prompt injection attacks, even if it cannot prevent them all?

It eliminates the need for any user monitoring
It makes attacks completely invisible to users
It raises the difficulty for attackers and creates a first line of defense
It automatically patches vulnerabilities in real-time

How Models Implement Instruction Hierarchy in 2026

The premise

What AI does well here

What AI cannot do

End-of-lesson check

How Models Implement Instruction Hierarchy in 2026

The premise

What AI does well here

What AI cannot do

End-of-lesson check