Loading lesson…
Heartbeats fail in ways reactive agents never do — silent drift, soul-state thrash, infinite loops. Debugging them takes different tools and a different mental model.
A reactive agent fails in front of the user — the bug is in the message you just got. A heartbeat soul fails while you're asleep. By the time you notice, it has run hundreds of beats, mutated its own memory, called dozens of tools, and possibly recovered (or not) without telling anyone. Debugging means rebuilding the story from logs, not watching it happen.
{
"beat_id": "b_2026_04_27_142055_pr-reviewer",
"trigger": { "type": "event", "source": "github.pull_request.opened", "id": "PR-1842" },
"started_at": "2026-04-27T14:20:55Z",
"duration_ms": 4321,
"input_tokens": 8450,
"output_tokens": 612,
"tool_calls": [
{ "name": "github.read_diff", "ok": true },
{ "name": "github.post_review", "ok": true }
],
"memory_deltas": [
{ "key": "recent_reviews", "op": "append", "size": 1 }
],
"outcome": "acted",
"next_beat": null
}A single beat's structured log. The cost of structured beats is the cost of being able to debug them — pay it.The single best heartbeat debugging tool is replay — re-running a past beat against the current code, with the original trigger and memory snapshot, and watching what happens. Reactive agents replay individual messages; heartbeat souls need to replay beats. A good runtime makes this a one-command operation: 'replay beat b_2026_04_27_142055.' The soul wakes up in a sandbox, sees what it saw then, and you watch it think.
| Failure mode | Symptom | Root-cause direction |
|---|---|---|
| Infinite loop | Beats-per-minute graph goes vertical; budget caps kick in | Self-paced soul picking tiny intervals, or recursive event trigger |
| Soul-state thrash | Memory deltas alternate forward and back every few beats | Two beats writing competing values; missing locks or stale reads |
| Drift | Soul's behavior slowly diverges from its job over days | Memory accumulating noise; bad facts learned and never corrected |
| Phantom no-ops | Soul wakes 1000 times, never acts, beats look fine | Trigger condition is always-true; soul thinks 'nothing to do' every time |
| Stuck retry | Same error every beat, error-rate breaker trips eventually | External tool returning a failure the soul doesn't recognize as fatal |
| Silent staleness | Soul keeps acting on data it stopped refreshing weeks ago | Refresh tool deprecated; soul never noticed |
The classic OpenClaw infinite loop has two flavors. The first is a self-paced soul whose state-update logic accidentally always returns 'wake me in 1 second' — caught by the rate limit, but only after a noisy minute. The second is two souls beating each other: soul A sends a message that triggers soul B's event heartbeat, which sends a message that triggers soul A's event heartbeat. The fix for both is the same — the rate limit floor is your friend, but the real fix is detecting the cycle in trigger logs and breaking it.
Thrash happens when two beats — usually two close-together beats from different triggers — disagree on what the memory should say, and each undoes the other's writes. You'll see memory deltas alternating forward and back. The fix is a single coordinator beat (only one type of trigger writes to a given memory key), or proper locks (a beat reads-then-writes atomically). Without one of those, your soul is in a small civil war with itself.
Drift is the slow killer. The soul behaves correctly on day one, mostly correctly on day seven, and oddly on day thirty. Usually the cause is accumulated memory — random facts the soul wrote during weird beats, never corrected, now distorting its sense of self. The cure is a periodic memory-consolidation heartbeat that prunes, summarizes, and corrects. Souls that never review their own memory always drift.
The big idea: heartbeats fail differently than reactive agents — silently, slowly, and at 3 AM. Per-beat traces, replay, and three named failure modes — infinite loops, thrash, drift — are the toolkit. Pause first, replay second, fix the root cause.
15 questions · take it digitally for instant feedback at tendril.neural-forge.io/learn/quiz/end-openclaw-heartbeats-debugging-creators
What is the core idea behind "Debugging A Heartbeat Loop: Observability, Replay, And Failure Modes"?
Which term best describes a foundational idea in "Debugging A Heartbeat Loop: Observability, Replay, And Failure Modes"?
A learner studying Debugging A Heartbeat Loop: Observability, Replay, And Failure Modes would need to understand which concept?
Which of these is directly relevant to Debugging A Heartbeat Loop: Observability, Replay, And Failure Modes?
Which of the following is a key point about Debugging A Heartbeat Loop: Observability, Replay, And Failure Modes?
Which of these does NOT belong in a discussion of Debugging A Heartbeat Loop: Observability, Replay, And Failure Modes?
Which statement is accurate regarding Debugging A Heartbeat Loop: Observability, Replay, And Failure Modes?
Which of these does NOT belong in a discussion of Debugging A Heartbeat Loop: Observability, Replay, And Failure Modes?
What is the key insight about "Replay catches Heisenbugs" in the context of Debugging A Heartbeat Loop: Observability, Replay, And Failure Modes?
What is the key insight about "Don't debug in production by guessing" in the context of Debugging A Heartbeat Loop: Observability, Replay, And Failure Modes?
What is the recommended tip about "Evaluate systematically" in the context of Debugging A Heartbeat Loop: Observability, Replay, And Failure Modes?
Which statement accurately describes an aspect of Debugging A Heartbeat Loop: Observability, Replay, And Failure Modes?
What does working with Debugging A Heartbeat Loop: Observability, Replay, And Failure Modes typically involve?
Which of the following is true about Debugging A Heartbeat Loop: Observability, Replay, And Failure Modes?
Which best describes the scope of "Debugging A Heartbeat Loop: Observability, Replay, And Failure Modes"?