Loading lesson…
Heartbeats fail in ways reactive agents never do — silent drift, soul-state thrash, infinite loops. Debugging them takes different tools and a different mental model.
A reactive agent fails in front of the user — the bug is in the message you just got. A heartbeat soul fails while you're asleep. By the time you notice, it has run hundreds of beats, mutated its own memory, called dozens of tools, and possibly recovered (or not) without telling anyone. Debugging means rebuilding the story from logs, not watching it happen.
{ "beat_id": "b_2026_04_27_142055_pr-reviewer", "trigger": { "type": "event", "source": "github.pull_request.opened", "id": "PR-1842" }, "started_at": "2026-04-27T14:20:55Z", "duration_ms": 4321, "input_tokens": 8450, "output_tokens": 612, "tool_calls": [ { "name": "github.read_diff", "ok": true }, { "name": "github.post_review", "ok": true } ], "memory_deltas": [ { "key": "recent_reviews", "op": "append", "size": 1 } ], "outcome": "acted", "next_beat": null }A single beat's structured log. The cost of structured beats is the cost of being able to debug them — pay it.The single best heartbeat debugging tool is replay — re-running a past beat against the current code, with the original trigger and memory snapshot, and watching what happens. Reactive agents replay individual messages; heartbeat souls need to replay beats. A good runtime makes this a one-command operation: 'replay beat b_2026_04_27_142055.' The soul wakes up in a sandbox, sees what it saw then, and you watch it think.
| Failure mode | Symptom | Root-cause direction |
|---|---|---|
| Infinite loop | Beats-per-minute graph goes vertical; budget caps kick in | Self-paced soul picking tiny intervals, or recursive event trigger |
| Soul-state thrash | Memory deltas alternate forward and back every few beats | Two beats writing competing values; missing locks or stale reads |
| Drift | Soul's behavior slowly diverges from its job over days | Memory accumulating noise; bad facts learned and never corrected |
| Phantom no-ops | Soul wakes 1000 times, never acts, beats look fine | Trigger condition is always-true; soul thinks 'nothing to do' every time |
| Stuck retry | Same error every beat, error-rate breaker trips eventually | External tool returning a failure the soul doesn't recognize as fatal |
| Silent staleness | Soul keeps acting on data it stopped refreshing weeks ago | Refresh tool deprecated; soul never noticed |
The classic OpenClaw infinite loop has two flavors. The first is a self-paced soul whose state-update logic accidentally always returns 'wake me in 1 second' — caught by the rate limit, but only after a noisy minute. The second is two souls beating each other: soul A sends a message that triggers soul B's event heartbeat, which sends a message that triggers soul A's event heartbeat. The fix for both is the same — the rate limit floor is your friend, but the real fix is detecting the cycle in trigger logs and breaking it.
Thrash happens when two beats — usually two close-together beats from different triggers — disagree on what the memory should say, and each undoes the other's writes. You'll see memory deltas alternating forward and back. The fix is a single coordinator beat (only one type of trigger writes to a given memory key), or proper locks (a beat reads-then-writes atomically). Without one of those, your soul is in a small civil war with itself.
Drift is the slow killer. The soul behaves correctly on day one, mostly correctly on day seven, and oddly on day thirty. Usually the cause is accumulated memory — random facts the soul wrote during weird beats, never corrected, now distorting its sense of self. The cure is a periodic memory-consolidation heartbeat that prunes, summarizes, and corrects. Souls that never review their own memory always drift.
The big idea: heartbeats fail differently than reactive agents — silently, slowly, and at 3 AM. Per-beat traces, replay, and three named failure modes — infinite loops, thrash, drift — are the toolkit. Pause first, replay second, fix the root cause.
8 questions · take it digitally for instant feedback at tendril.neural-forge.io/learn/quiz/end-openclaw-heartbeats-debugging-creators
What is the main idea of "Debugging A Heartbeat Loop: Observability, Replay, And Failure Modes"?
Which concept is most central to "Debugging A Heartbeat Loop: Observability, Replay, And Failure Modes"?
Which use of AI fits this topic best?
What should a careful learner remember about "Replay catches Heisenbugs"?
You want to use AI after this lesson. What is the safest next step?
How should AI output about observability be treated?
Name one way to verify an AI answer about observability.
Which action would help you apply "Debugging A Heartbeat Loop: Observability, Replay, And Failure Modes" responsibly?