← Back to PRs

#23722: fix: avoid false positive reminder hallucination detection

by openjinx99-debug open 2026-02-22 16:36 View on GitHub →
size: S
## Problem When the agent's response text confirms a reminder was actually scheduled (e.g., 'Reminder is set' or 'Cron job created'), the guard note about unbacked reminder commitments was still being appended, causing confusion. For example, this text would trigger the false positive: > I'll remind you tomorrow. Reminder is set for 9am. Would become: > I'll remind you tomorrow. Reminder is set for 9am. > > Note: I did not schedule a reminder in this turn... ## Solution Added `REMINDER_CONFIRMED_PATTERNS` to detect phrases that confirm a reminder was scheduled: - `reminder is set` - `reminder has been created/scheduled/added` - `cron job created/scheduled/added` - `scheduled for [time]` When any of these patterns are found, the guard note is skipped. ## Testing Added 2 test cases: - `skips guard note when text confirms reminder was scheduled` - `skips guard note when text mentions cron job was created` All existing tests pass. <!-- greptile_comment --> <h3>Greptile Summary</h3> This PR adds text-based pattern matching (`REMINDER_CONFIRMED_PATTERNS`) to suppress the reminder hallucination guard note when the agent's response contains phrases like "Reminder is set" or "Cron job created." It also includes minor style cleanups (em-dash → dash in comments). - **Core concern:** The guard note exists to catch LLM hallucinations — cases where the model claims it scheduled a reminder but didn't. By trusting the LLM's own text to suppress this guard (via `REMINDER_CONFIRMED_PATTERNS`), an LLM that hallucinates "Reminder is set" will now bypass the safety net. The `successfulCronAdds` counter is an action-based signal that reliably tracks actual `cron.add` tool execution; the new text patterns are inherently less reliable since they trust the same output the guard is meant to second-guess. - **Legitimate use case:** The comment mentions `exec + openclaw cron add` as a path where reminders could be scheduled without incrementing `successfulCronAdds`. If this is the primary motivator, consider tracking exec-based cron invocations at the tool-execution level rather than relying on text matching. - **Tests:** Two new test cases added, both well-structured but only covering the happy path (guard suppression with `successfulCronAdds: 0`). No adversarial test covers the hallucination scenario. <h3>Confidence Score: 2/5</h3> - This PR weakens a hallucination guard by trusting LLM text output, which risks re-introducing the original false-positive problem in reverse (false negatives). - Score of 2 reflects the fundamental tension between fixing false positives and introducing false negatives. The text-based confirmation patterns can be hallucinated just as easily as the commitment patterns they're meant to counterbalance. While the change is well-tested for the intended scenario, it lacks adversarial testing and relies on a fundamentally unreliable signal (LLM text) to override a safety mechanism. - Pay close attention to `src/auto-reply/reply/agent-runner.ts` — the `hasUnbackedReminderCommitment()` function now trusts LLM text patterns to suppress its own hallucination guard. <sub>Last reviewed commit: e8456c0</sub> <!-- greptile_other_comments_section --> <sub>(2/5) Greptile learns from your feedback when you react with thumbs up/down!</sub> <!-- /greptile_comment -->

Most Similar PRs