Knowledge check · 15 questions
Tests understanding of how to build comprehensive eval suites for AI agents that measure trajectories, cost, safety, and regression
AI Agent Evaluation Harnesses: Beyond Pass/Fail — Quick Check
15 questions