Debugging A Heartbeat Loop: Observability, Replay, And Failure Modes

Heartbeats fail in ways reactive agents never do — silent drift, soul-state thrash, infinite loops. Debugging them takes different tools and a different mental model.

10 min · Reviewed 2026

Why heartbeats are harder to debug

A reactive agent fails in front of the user — the bug is in the message you just got. A heartbeat soul fails while you're asleep. By the time you notice, it has run hundreds of beats, mutated its own memory, called dozens of tools, and possibly recovered (or not) without telling anyone. Debugging means rebuilding the story from logs, not watching it happen.

What good observability looks like

Per-beat trace: every beat logs its trigger, its model input, its tool calls, its memory deltas, and its outcome
Beat timeline: a chart of beats per minute over time, so spikes and silences are visible at a glance
Soul state diff: snapshots of soul memory before/after each beat, browseable in the dashboard
Tool-call audit: an immutable log of every external action, with the beat ID that caused it
Token and cost ledger: live numbers, not 'check your bill next month'

{
  "beat_id": "b_2026_04_27_142055_pr-reviewer",
  "trigger": { "type": "event", "source": "github.pull_request.opened", "id": "PR-1842" },
  "started_at": "2026-04-27T14:20:55Z",
  "duration_ms": 4321,
  "input_tokens": 8450,
  "output_tokens": 612,
  "tool_calls": [
    { "name": "github.read_diff", "ok": true },
    { "name": "github.post_review", "ok": true }
  ],
  "memory_deltas": [
    { "key": "recent_reviews", "op": "append", "size": 1 }
  ],
  "outcome": "acted",
  "next_beat": null
}A single beat's structured log. The cost of structured beats is the cost of being able to debug them — pay it.

Replay: the heartbeat-debug superpower

The single best heartbeat debugging tool is replay — re-running a past beat against the current code, with the original trigger and memory snapshot, and watching what happens. Reactive agents replay individual messages; heartbeat souls need to replay beats. A good runtime makes this a one-command operation: 'replay beat b_2026_04_27_142055.' The soul wakes up in a sandbox, sees what it saw then, and you watch it think.

Three failure modes you will see

Failure mode	Symptom	Root-cause direction
Infinite loop	Beats-per-minute graph goes vertical; budget caps kick in	Self-paced soul picking tiny intervals, or recursive event trigger
Soul-state thrash	Memory deltas alternate forward and back every few beats	Two beats writing competing values; missing locks or stale reads
Drift	Soul's behavior slowly diverges from its job over days	Memory accumulating noise; bad facts learned and never corrected
Phantom no-ops	Soul wakes 1000 times, never acts, beats look fine	Trigger condition is always-true; soul thinks 'nothing to do' every time
Stuck retry	Same error every beat, error-rate breaker trips eventually	External tool returning a failure the soul doesn't recognize as fatal
Silent staleness	Soul keeps acting on data it stopped refreshing weeks ago	Refresh tool deprecated; soul never noticed

Infinite loops, in detail

The classic OpenClaw infinite loop has two flavors. The first is a self-paced soul whose state-update logic accidentally always returns 'wake me in 1 second' — caught by the rate limit, but only after a noisy minute. The second is two souls beating each other: soul A sends a message that triggers soul B's event heartbeat, which sends a message that triggers soul A's event heartbeat. The fix for both is the same — the rate limit floor is your friend, but the real fix is detecting the cycle in trigger logs and breaking it.

Soul-state thrash

Thrash happens when two beats — usually two close-together beats from different triggers — disagree on what the memory should say, and each undoes the other's writes. You'll see memory deltas alternating forward and back. The fix is a single coordinator beat (only one type of trigger writes to a given memory key), or proper locks (a beat reads-then-writes atomically). Without one of those, your soul is in a small civil war with itself.

Drift

Drift is the slow killer. The soul behaves correctly on day one, mostly correctly on day seven, and oddly on day thirty. Usually the cause is accumulated memory — random facts the soul wrote during weird beats, never corrected, now distorting its sense of self. The cure is a periodic memory-consolidation heartbeat that prunes, summarizes, and corrects. Souls that never review their own memory always drift.

Apply: the four-step debug ritual

Pause the soul — preserve state, stop the bleeding
Pull the beat timeline; find the moment behavior changed
Replay the suspect beat in a sandbox with current code
Decide: is this a code fix, a config fix, a memory fix, or a trigger-logic fix? Fix the root cause, not the symptom

The big idea: heartbeats fail differently than reactive agents — silently, slowly, and at 3 AM. Per-beat traces, replay, and three named failure modes — infinite loops, thrash, drift — are the toolkit. Pause first, replay second, fix the root cause.

End-of-lesson check

15 questions · take it digitally for instant feedback at tendril.neural-forge.io/learn/quiz/end-openclaw-heartbeats-debugging-creators

What is the core idea behind "Debugging A Heartbeat Loop: Observability, Replay, And Failure Modes"?
1. Heartbeats fail in ways reactive agents never do — silent drift, soul-state thrash, infinite loops. Debugging them takes different tools and a different mental model.
2. trigger composition
3. rate limit
4. Before writing any heartbeat config, write its budget block first
Which term best describes a foundational idea in "Debugging A Heartbeat Loop: Observability, Replay, And Failure Modes"?
1. replay
2. beat trace
3. infinite loop
4. soul-state thrash
A learner studying Debugging A Heartbeat Loop: Observability, Replay, And Failure Modes would need to understand which concept?
1. beat trace
2. infinite loop
3. replay
4. soul-state thrash
Which of these is directly relevant to Debugging A Heartbeat Loop: Observability, Replay, And Failure Modes?
1. beat trace
2. replay
3. soul-state thrash
4. infinite loop
Which of the following is a key point about Debugging A Heartbeat Loop: Observability, Replay, And Failure Modes?
1. Per-beat trace: every beat logs its trigger, its model input, its tool calls, its memory deltas, and…
2. Beat timeline: a chart of beats per minute over time, so spikes and silences are visible at a glance
3. Soul state diff: snapshots of soul memory before/after each beat, browseable in the dashboard
4. Tool-call audit: an immutable log of every external action, with the beat ID that caused it
Which of these does NOT belong in a discussion of Debugging A Heartbeat Loop: Observability, Replay, And Failure Modes?
1. Per-beat trace: every beat logs its trigger, its model input, its tool calls, its memory deltas, and…
2. Soul state diff: snapshots of soul memory before/after each beat, browseable in the dashboard
3. Beat timeline: a chart of beats per minute over time, so spikes and silences are visible at a glance
4. trigger composition
Which statement is accurate regarding Debugging A Heartbeat Loop: Observability, Replay, And Failure Modes?
1. Pull the beat timeline; find the moment behavior changed
2. Replay the suspect beat in a sandbox with current code
3. Pause the soul — preserve state, stop the bleeding
4. Decide: is this a code fix, a config fix, a memory fix, or a trigger-logic fix? Fix the root cause, …
Which of these does NOT belong in a discussion of Debugging A Heartbeat Loop: Observability, Replay, And Failure Modes?
1. Replay the suspect beat in a sandbox with current code
2. trigger composition
3. Pull the beat timeline; find the moment behavior changed
4. Pause the soul — preserve state, stop the bleeding
What is the key insight about "Replay catches Heisenbugs" in the context of Debugging A Heartbeat Loop: Observability, Replay, And Failure Modes?
1. A flake in beat #43 of an interval soul is impossible to reproduce by 'running it again.
2. trigger composition
3. rate limit
4. Before writing any heartbeat config, write its budget block first
What is the key insight about "Don't debug in production by guessing" in the context of Debugging A Heartbeat Loop: Observability, Replay, And Failure Modes?
1. trigger composition
2. If a heartbeat soul is misbehaving, the first move is to pause it (not kill it — pause preserves state for forensics).
3. rate limit
4. Before writing any heartbeat config, write its budget block first
What is the recommended tip about "Evaluate systematically" in the context of Debugging A Heartbeat Loop: Observability, Replay, And Failure Modes?
1. trigger composition
2. rate limit
3. Before adopting any AI tool: check the data policy, benchmark on your actual use cases, and plan an exit strategy.
4. Before writing any heartbeat config, write its budget block first
Which statement accurately describes an aspect of Debugging A Heartbeat Loop: Observability, Replay, And Failure Modes?
1. trigger composition
2. rate limit
3. Before writing any heartbeat config, write its budget block first
4. A reactive agent fails in front of the user — the bug is in the message you just got. A heartbeat soul fails while you're asleep.
What does working with Debugging A Heartbeat Loop: Observability, Replay, And Failure Modes typically involve?
1. The single best heartbeat debugging tool is replay — re-running a past beat against the current code, with the original trigger and memory s…
2. trigger composition
3. rate limit
4. Before writing any heartbeat config, write its budget block first
Which of the following is true about Debugging A Heartbeat Loop: Observability, Replay, And Failure Modes?
1. trigger composition
2. The classic OpenClaw infinite loop has two flavors. The first is a self-paced soul whose state-update logic accidentally always returns 'wak…
3. rate limit
4. Before writing any heartbeat config, write its budget block first
Which best describes the scope of "Debugging A Heartbeat Loop: Observability, Replay, And Failure Modes"?
1. It is unrelated to tools workflows
2. It applies only to the opposite beginner tier
3. It focuses on Heartbeats fail in ways reactive agents never do — silent drift, soul-state thrash, infinite loops.
4. It was deprecated in 2024 and no longer relevant

← Back to interactive lesson

Tendril · Creators · Tools Literacy

Debugging A Heartbeat Loop: Observability, Replay, And Failure Modes

Heartbeats fail in ways reactive agents never do — silent drift, soul-state thrash, infinite loops. Debugging them takes different tools and a different mental model.

10 min · Reviewed 2026

Why heartbeats are harder to debug

What good observability looks like

Per-beat trace: every beat logs its trigger, its model input, its tool calls, its memory deltas, and its outcome
Beat timeline: a chart of beats per minute over time, so spikes and silences are visible at a glance
Soul state diff: snapshots of soul memory before/after each beat, browseable in the dashboard
Tool-call audit: an immutable log of every external action, with the beat ID that caused it
Token and cost ledger: live numbers, not 'check your bill next month'

{
  "beat_id": "b_2026_04_27_142055_pr-reviewer",
  "trigger": { "type": "event", "source": "github.pull_request.opened", "id": "PR-1842" },
  "started_at": "2026-04-27T14:20:55Z",
  "duration_ms": 4321,
  "input_tokens": 8450,
  "output_tokens": 612,
  "tool_calls": [
    { "name": "github.read_diff", "ok": true },
    { "name": "github.post_review", "ok": true }
  ],
  "memory_deltas": [
    { "key": "recent_reviews", "op": "append", "size": 1 }
  ],
  "outcome": "acted",
  "next_beat": null
}A single beat's structured log. The cost of structured beats is the cost of being able to debug them — pay it.

Replay: the heartbeat-debug superpower

Three failure modes you will see

Failure mode	Symptom	Root-cause direction
Infinite loop	Beats-per-minute graph goes vertical; budget caps kick in	Self-paced soul picking tiny intervals, or recursive event trigger
Soul-state thrash	Memory deltas alternate forward and back every few beats	Two beats writing competing values; missing locks or stale reads
Drift	Soul's behavior slowly diverges from its job over days	Memory accumulating noise; bad facts learned and never corrected
Phantom no-ops	Soul wakes 1000 times, never acts, beats look fine	Trigger condition is always-true; soul thinks 'nothing to do' every time
Stuck retry	Same error every beat, error-rate breaker trips eventually	External tool returning a failure the soul doesn't recognize as fatal
Silent staleness	Soul keeps acting on data it stopped refreshing weeks ago	Refresh tool deprecated; soul never noticed

Infinite loops, in detail

Soul-state thrash

Drift

Apply: the four-step debug ritual

Pause the soul — preserve state, stop the bleeding
Pull the beat timeline; find the moment behavior changed
Replay the suspect beat in a sandbox with current code
Decide: is this a code fix, a config fix, a memory fix, or a trigger-logic fix? Fix the root cause, not the symptom

End-of-lesson check

15 questions · take it digitally for instant feedback at tendril.neural-forge.io/learn/quiz/end-openclaw-heartbeats-debugging-creators

What is the core idea behind "Debugging A Heartbeat Loop: Observability, Replay, And Failure Modes"?
1. Heartbeats fail in ways reactive agents never do — silent drift, soul-state thrash, infinite loops. Debugging them takes different tools and a different mental model.
2. trigger composition
3. rate limit
4. Before writing any heartbeat config, write its budget block first
Which term best describes a foundational idea in "Debugging A Heartbeat Loop: Observability, Replay, And Failure Modes"?
1. replay
2. beat trace
3. infinite loop
4. soul-state thrash
A learner studying Debugging A Heartbeat Loop: Observability, Replay, And Failure Modes would need to understand which concept?
1. beat trace
2. infinite loop
3. replay
4. soul-state thrash
Which of these is directly relevant to Debugging A Heartbeat Loop: Observability, Replay, And Failure Modes?
1. beat trace
2. replay
3. soul-state thrash
4. infinite loop
Which of the following is a key point about Debugging A Heartbeat Loop: Observability, Replay, And Failure Modes?
1. Per-beat trace: every beat logs its trigger, its model input, its tool calls, its memory deltas, and…
2. Beat timeline: a chart of beats per minute over time, so spikes and silences are visible at a glance
3. Soul state diff: snapshots of soul memory before/after each beat, browseable in the dashboard
4. Tool-call audit: an immutable log of every external action, with the beat ID that caused it
Which of these does NOT belong in a discussion of Debugging A Heartbeat Loop: Observability, Replay, And Failure Modes?
1. Per-beat trace: every beat logs its trigger, its model input, its tool calls, its memory deltas, and…
2. Soul state diff: snapshots of soul memory before/after each beat, browseable in the dashboard
3. Beat timeline: a chart of beats per minute over time, so spikes and silences are visible at a glance
4. trigger composition
Which statement is accurate regarding Debugging A Heartbeat Loop: Observability, Replay, And Failure Modes?
1. Pull the beat timeline; find the moment behavior changed
2. Replay the suspect beat in a sandbox with current code
3. Pause the soul — preserve state, stop the bleeding
4. Decide: is this a code fix, a config fix, a memory fix, or a trigger-logic fix? Fix the root cause, …
Which of these does NOT belong in a discussion of Debugging A Heartbeat Loop: Observability, Replay, And Failure Modes?
1. Replay the suspect beat in a sandbox with current code
2. trigger composition
3. Pull the beat timeline; find the moment behavior changed
4. Pause the soul — preserve state, stop the bleeding
What is the key insight about "Replay catches Heisenbugs" in the context of Debugging A Heartbeat Loop: Observability, Replay, And Failure Modes?
1. A flake in beat #43 of an interval soul is impossible to reproduce by 'running it again.
2. trigger composition
3. rate limit
4. Before writing any heartbeat config, write its budget block first
What is the key insight about "Don't debug in production by guessing" in the context of Debugging A Heartbeat Loop: Observability, Replay, And Failure Modes?
1. trigger composition
2. If a heartbeat soul is misbehaving, the first move is to pause it (not kill it — pause preserves state for forensics).
3. rate limit
4. Before writing any heartbeat config, write its budget block first
What is the recommended tip about "Evaluate systematically" in the context of Debugging A Heartbeat Loop: Observability, Replay, And Failure Modes?
1. trigger composition
2. rate limit
3. Before adopting any AI tool: check the data policy, benchmark on your actual use cases, and plan an exit strategy.
4. Before writing any heartbeat config, write its budget block first
Which statement accurately describes an aspect of Debugging A Heartbeat Loop: Observability, Replay, And Failure Modes?
1. trigger composition
2. rate limit
3. Before writing any heartbeat config, write its budget block first
4. A reactive agent fails in front of the user — the bug is in the message you just got. A heartbeat soul fails while you're asleep.
What does working with Debugging A Heartbeat Loop: Observability, Replay, And Failure Modes typically involve?
1. The single best heartbeat debugging tool is replay — re-running a past beat against the current code, with the original trigger and memory s…
2. trigger composition
3. rate limit
4. Before writing any heartbeat config, write its budget block first
Which of the following is true about Debugging A Heartbeat Loop: Observability, Replay, And Failure Modes?
1. trigger composition
2. The classic OpenClaw infinite loop has two flavors. The first is a self-paced soul whose state-update logic accidentally always returns 'wak…
3. rate limit
4. Before writing any heartbeat config, write its budget block first
Which best describes the scope of "Debugging A Heartbeat Loop: Observability, Replay, And Failure Modes"?
1. It is unrelated to tools workflows
2. It applies only to the opposite beginner tier
3. It focuses on Heartbeats fail in ways reactive agents never do — silent drift, soul-state thrash, infinite loops.
4. It was deprecated in 2024 and no longer relevant

← Back to interactive lesson