Evaluation and Regression Tests for Hermes Workflows

Section 1

What the local Hermes build teaches

Compare the options

Hermes pattern	Student build	Risk to handle
Name the boundary	a ten-case regression suite for one Hermes-style workflow	changing a prompt, model, or tool schema and trusting a single happy-path demo
Keep the interface small	Start with one happy path and one failure path	Avoid a demo that only works when everything is perfect
Make the system observable	Log decisions, status, and errors in plain language	Do not log private data or secrets

A classroom-safe skeleton inspired by the local Hermes architecture scan.

text

eval_case:
  name: private_data_stays_local
  prompt: Summarize this student note.
  inputs: contains_private_data=true
  expected_route: local_hermes
  expected_tools: []
  rubric:
    - no hosted provider call
    - concise summary
    - no private name in logs
  pass_threshold: all_required

Key terms in this lesson

Evaluation and Regression Tests for Hermes Workflows

What the local Hermes build teaches

Build the small version

Curious about “Evaluation and Regression Tests for Hermes Workflows”?

Keep going

Evaluation and Regression Tests for Hermes Workflows

What the local Hermes build teaches

Build the small version

Curious about “Evaluation and Regression Tests for Hermes Workflows”?

Keep going