Tendril — AI Lessons for Real Life

Tendril

The premise

Bias audits at deployment catch only what was tested; production audits catch what emerges with real users.

What AI does well here

Define fairness metrics appropriate to the use case (demographic parity, equal opportunity, calibration) before launch

Implement automated audits running on production traffic with alerting on drift

Maintain a fairness incident process — what happens when an audit flags a problem

Document the protected attributes and proxies the system might be using

What AI cannot do

Resolve the trade-offs between competing fairness metrics (no single metric satisfies all)

Replace human review of borderline fairness cases

Substitute for the diverse stakeholder input that defines what 'fair' means in context

End-of-lesson check

15 questions · take it digitally for instant feedback at tendril.neural-forge.io/learn/quiz/end-ethics-safety-bias-audit-pipeline-adults

Which fairness metric is satisfied when the probability of receiving a positive outcome is equal across all demographic groups?

Demographic parity
Equal opportunity
Calibration
Predictive parity

What is 'fairness drift' in a production AI system?

A gradual increase in model accuracy over time
A deliberate adjustment of fairness thresholds by operators
The process of retraining models on new data
Changes in fairness metric values due to shifts in real-world data distributions

A company chooses demographic parity over equal opportunity for their hiring algorithm. What is the most accurate characterization of this decision?

A purely technical optimization choice
A values-based decision with significant consequences
A decision that can be automated by the AI system
A choice that has no impact on affected individuals

Why can AI systems not fully resolve trade-offs between competing fairness metrics?

Because the metrics were poorly designed
Because the algorithms are not sophisticated enough
Because different fairness metrics encode different ethical values that require human judgment
Because production data is insufficient

In a production bias audit pipeline, what is the primary function of setting thresholds that trigger human review?

To completely automate fairness decisions
To reduce the need for any human involvement
To flag situations requiring contextual judgment beyond automated metrics
To permanently disable the system when exceeded

What is a 'proxy' attribute in fairness monitoring?

The primary output of the AI model
A fairness metric that is no longer in use
A feature that inadvertently correlates with protected attributes
An alternative name for protected attributes

What does 'calibration' measure in fairness contexts?

Whether different groups receive similar numbers of positive predictions
Whether predicted probabilities match actual outcomes within each group
Whether the model performs equally well across all groups
Whether the model treats similar individuals similarly

Why is stakeholder input necessary when defining what 'fair' means for an AI system?

To increase the model's prediction accuracy
To comply with technical requirements
Because stakeholders can help clean the training data
Because different stakeholders may have legitimately different conceptions of fairness relevant to their communities

What is 'disparate impact' in employment-related AI auditing?

A policy or practice that disproportionately harms a protected group even without discriminatory intent
Intentional discrimination against a protected group
The difference in pay between executives and workers
A measure of model accuracy across demographic groups

What is the fundamental limitation of running bias audits only at the time of deployment?

They require too much computational resources
They cannot detect issues that emerge from real-world usage patterns, distribution shift, and edge-case interactions
Deployment audits are illegal
Deployment audits are too expensive

What should trigger a fairness incident response process?

A single instance of model prediction error
A complaint from a competitor
Automated audit flags exceeding defined thresholds
A change in the company's stock price

What role do automated audits play in a production bias audit pipeline?

They continuously monitor production traffic and alert when fairness metrics drift
They determine the final fairness decisions for the organization
They are only run during system development
They replace the need for any human oversight

Why is publishing fairness documentation externally important?

It is required by all AI companies
It enables external accountability, builds trust with affected communities, and allows independent verification
It reduces the company's legal liability
It improves the model's accuracy

When fairness metrics in production exceed threshold values, what is the appropriate immediate action?

Ignore the alert if the system is still functioning
Initiate human review and follow the incident response process
Retrain the model with the same data
Immediately shut down the system permanently

Why might 'equal opportunity' be preferred over 'demographic parity' in some contexts?

Demographic parity has been proven to be illegal
Equal opportunity is mathematically easier to achieve
Equal opportunity focuses on treating similar individuals similarly regardless of group membership, which may align better with merit-based contexts
Equal opportunity requires less computational resources

The premise

Bias audits at deployment catch only what was tested; production audits catch what emerges with real users.

What AI does well here

Define fairness metrics appropriate to the use case (demographic parity, equal opportunity, calibration) before launch

Implement automated audits running on production traffic with alerting on drift

Maintain a fairness incident process — what happens when an audit flags a problem

Document the protected attributes and proxies the system might be using

What AI cannot do

Resolve the trade-offs between competing fairness metrics (no single metric satisfies all)

Replace human review of borderline fairness cases

Substitute for the diverse stakeholder input that defines what 'fair' means in context

End-of-lesson check

15 questions · take it digitally for instant feedback at tendril.neural-forge.io/learn/quiz/end-ethics-safety-bias-audit-pipeline-adults

Which fairness metric is satisfied when the probability of receiving a positive outcome is equal across all demographic groups?

Demographic parity
Equal opportunity
Calibration
Predictive parity

What is 'fairness drift' in a production AI system?

A gradual increase in model accuracy over time
A deliberate adjustment of fairness thresholds by operators
The process of retraining models on new data
Changes in fairness metric values due to shifts in real-world data distributions

A company chooses demographic parity over equal opportunity for their hiring algorithm. What is the most accurate characterization of this decision?

A purely technical optimization choice
A values-based decision with significant consequences
A decision that can be automated by the AI system
A choice that has no impact on affected individuals

Why can AI systems not fully resolve trade-offs between competing fairness metrics?

Because the metrics were poorly designed
Because the algorithms are not sophisticated enough
Because different fairness metrics encode different ethical values that require human judgment
Because production data is insufficient

In a production bias audit pipeline, what is the primary function of setting thresholds that trigger human review?

To completely automate fairness decisions
To reduce the need for any human involvement
To flag situations requiring contextual judgment beyond automated metrics
To permanently disable the system when exceeded

What is a 'proxy' attribute in fairness monitoring?

The primary output of the AI model
A fairness metric that is no longer in use
A feature that inadvertently correlates with protected attributes
An alternative name for protected attributes

What does 'calibration' measure in fairness contexts?

Whether different groups receive similar numbers of positive predictions
Whether predicted probabilities match actual outcomes within each group
Whether the model performs equally well across all groups
Whether the model treats similar individuals similarly

Why is stakeholder input necessary when defining what 'fair' means for an AI system?

To increase the model's prediction accuracy
To comply with technical requirements
Because stakeholders can help clean the training data
Because different stakeholders may have legitimately different conceptions of fairness relevant to their communities

What is 'disparate impact' in employment-related AI auditing?

A policy or practice that disproportionately harms a protected group even without discriminatory intent
Intentional discrimination against a protected group
The difference in pay between executives and workers
A measure of model accuracy across demographic groups

What is the fundamental limitation of running bias audits only at the time of deployment?

They require too much computational resources
They cannot detect issues that emerge from real-world usage patterns, distribution shift, and edge-case interactions
Deployment audits are illegal
Deployment audits are too expensive

What should trigger a fairness incident response process?

A single instance of model prediction error
A complaint from a competitor
Automated audit flags exceeding defined thresholds
A change in the company's stock price

What role do automated audits play in a production bias audit pipeline?

They continuously monitor production traffic and alert when fairness metrics drift
They determine the final fairness decisions for the organization
They are only run during system development
They replace the need for any human oversight

Why is publishing fairness documentation externally important?

It is required by all AI companies
It enables external accountability, builds trust with affected communities, and allows independent verification
It reduces the company's legal liability
It improves the model's accuracy

When fairness metrics in production exceed threshold values, what is the appropriate immediate action?

Ignore the alert if the system is still functioning
Initiate human review and follow the incident response process
Retrain the model with the same data
Immediately shut down the system permanently

Why might 'equal opportunity' be preferred over 'demographic parity' in some contexts?

Demographic parity has been proven to be illegal
Equal opportunity is mathematically easier to achieve
Equal opportunity focuses on treating similar individuals similarly regardless of group membership, which may align better with merit-based contexts
Equal opportunity requires less computational resources

Bias Audits That Catch Problems Before Deployment: A Production Audit Pipeline

The premise

What AI does well here

What AI cannot do

End-of-lesson check

Bias Audits That Catch Problems Before Deployment: A Production Audit Pipeline

The premise

What AI does well here

What AI cannot do

End-of-lesson check