#23722: fix: avoid false positive reminder hallucination detection
size: S
## Problem
When the agent's response text confirms a reminder was actually scheduled (e.g., 'Reminder is set' or 'Cron job created'), the guard note about unbacked reminder commitments was still being appended, causing confusion.
For example, this text would trigger the false positive:
> I'll remind you tomorrow. Reminder is set for 9am.
Would become:
> I'll remind you tomorrow. Reminder is set for 9am.
>
> Note: I did not schedule a reminder in this turn...
## Solution
Added `REMINDER_CONFIRMED_PATTERNS` to detect phrases that confirm a reminder was scheduled:
- `reminder is set`
- `reminder has been created/scheduled/added`
- `cron job created/scheduled/added`
- `scheduled for [time]`
When any of these patterns are found, the guard note is skipped.
## Testing
Added 2 test cases:
- `skips guard note when text confirms reminder was scheduled`
- `skips guard note when text mentions cron job was created`
All existing tests pass.
<!-- greptile_comment -->
<h3>Greptile Summary</h3>
This PR adds text-based pattern matching (`REMINDER_CONFIRMED_PATTERNS`) to suppress the reminder hallucination guard note when the agent's response contains phrases like "Reminder is set" or "Cron job created." It also includes minor style cleanups (em-dash → dash in comments).
- **Core concern:** The guard note exists to catch LLM hallucinations — cases where the model claims it scheduled a reminder but didn't. By trusting the LLM's own text to suppress this guard (via `REMINDER_CONFIRMED_PATTERNS`), an LLM that hallucinates "Reminder is set" will now bypass the safety net. The `successfulCronAdds` counter is an action-based signal that reliably tracks actual `cron.add` tool execution; the new text patterns are inherently less reliable since they trust the same output the guard is meant to second-guess.
- **Legitimate use case:** The comment mentions `exec + openclaw cron add` as a path where reminders could be scheduled without incrementing `successfulCronAdds`. If this is the primary motivator, consider tracking exec-based cron invocations at the tool-execution level rather than relying on text matching.
- **Tests:** Two new test cases added, both well-structured but only covering the happy path (guard suppression with `successfulCronAdds: 0`). No adversarial test covers the hallucination scenario.
<h3>Confidence Score: 2/5</h3>
- This PR weakens a hallucination guard by trusting LLM text output, which risks re-introducing the original false-positive problem in reverse (false negatives).
- Score of 2 reflects the fundamental tension between fixing false positives and introducing false negatives. The text-based confirmation patterns can be hallucinated just as easily as the commitment patterns they're meant to counterbalance. While the change is well-tested for the intended scenario, it lacks adversarial testing and relies on a fundamentally unreliable signal (LLM text) to override a safety mechanism.
- Pay close attention to `src/auto-reply/reply/agent-runner.ts` — the `hasUnbackedReminderCommitment()` function now trusts LLM text patterns to suppress its own hallucination guard.
<sub>Last reviewed commit: e8456c0</sub>
<!-- greptile_other_comments_section -->
<sub>(2/5) Greptile learns from your feedback when you react with thumbs up/down!</sub>
<!-- /greptile_comment -->
Most Similar PRs
#23184: fix(auto-reply): prevent reminder guard note from leaking into chan...
by lailoo · 2026-02-22
73.9%
#8307: fix(cron): improve tool description with reliable reminder guidance
by vishaltandale00 · 2026-02-03
73.9%
#19916: fix: strict silent-reply detection to prevent false positives with ...
by hayoial · 2026-02-18
72.2%
#8086: feat(security): Add prompt injection guard rail
by bobbythelobster · 2026-02-03
72.1%
#13318: fix(agents): prevent sanitizeUserFacingText from rewriting conversa...
by hleliofficiel · 2026-02-10
72.1%
#15896: fix(memory-lancedb): capture even with injected recall context
by aelaguiz · 2026-02-14
71.9%
#8097: fix: auto-convert one-shot reminders for reliable delivery
by Gerrald12312 · 2026-02-03
71.9%
#17743: fix(agents): disable orphaned user message deletion that causes ses...
by clawrl3000 · 2026-02-16
71.5%
#16733: fix(ui): avoid injected newlines when tool output is hidden
by jp117 · 2026-02-15
71.5%
#6522: fix(cron): deliver original message when agent response is heartbea...
by sidmohan0 · 2026-02-01
71.3%