#21915: Cron: add startup session reaper timing metrics
docs
size: S
trusted-contributor
Cluster:
Cron Job Stability Fixes
## Summary
- add startup session reaper sweep timing metrics in `CronService.start()`
- include aggregate counters (`storeCount`, `sweptStores`, `totalPruned`, `failedStores`, `elapsedMs`) and slowest-store timings (`slowestStorePath`, `slowestStoreElapsedMs`)
- add regression coverage for disabled-cron startup sweep to assert metrics log payload
- fix zero-duration edge case so `slowestStorePath` is populated when at least one store is swept
- keep local run artifacts out of PR scope
## Verification
- `pnpm exec vitest run src/cron/service.skips-main-jobs-empty-systemevent-text.test.ts src/cron/session-reaper.test.ts src/cron/service.issue-regressions.test.ts src/config/sessions.cache.test.ts`
- `pnpm exec oxlint src/cron/service/ops.ts src/cron/service.skips-main-jobs-empty-systemevent-text.test.ts src/cron/service/timer.ts src/cron/service/reaper-paths.ts src/config/sessions/store.ts src/config/sessions.cache.test.ts`
- `pnpm check` *(fails in this workspace at `format:check` because of large pre-existing unrelated formatting churn outside this patch)*
## Risk / compatibility
- no schema/config/API surface changes
- no behavior changes for cron job execution flow
- adds one informational startup log line and startup sweep timing metadata only
<!-- greptile_comment -->
<h3>Greptile Summary</h3>
This PR adds startup session reaper timing metrics to `CronService.start()` and includes comprehensive error handling infrastructure improvements.
**Main changes:**
- Added timing metrics for session reaper sweep on startup (store count, pruned sessions, elapsed time, slowest store tracking)
- Test coverage for disabled-cron startup sweep scenario validates metrics logging
- Fixed zero-duration edge case ensuring `slowestStorePath` is populated when at least one store is swept
**Additional changes (outside stated scope):**
- Complete domain error hierarchy with `DomainError` base class and specific implementations (`ProviderConnectionError`, `ProviderAuthError`, `ValidationError`, etc.)
- Result<T, E> pattern for railway-oriented programming
- Comprehensive error handling documentation (495 lines in `docs/learn/10-error-handling.md`)
- Android gateway server proposal documentation (3 new doc files)
**Concerns:**
The PR description focuses on cron metrics but includes substantial error handling refactoring and documentation that appears unrelated to the stated purpose. This bundling of unrelated changes makes review and rollback more difficult.
<h3>Confidence Score: 3/5</h3>
- Safe to merge but requires follow-up discussion about PR scope
- The cron metrics implementation is well-tested and follows existing patterns. However, the PR bundles unrelated changes (error handling refactor, documentation) that weren't mentioned in the title or summary, making it harder to review atomically and increasing risk of unintended side effects
- Review `src/infra/errors.ts` and `src/infra/unhandled-rejections.ts` for the error handling changes which appear unrelated to session reaper metrics
<sub>Last reviewed commit: aa98ea6</sub>
<!-- greptile_other_comments_section -->
<!-- /greptile_comment -->
Most Similar PRs
#17064: fix(cron): prevent control-plane starvation during startup catch-up...
by donggyu9208 · 2026-02-15
80.4%
#13055: fix: prevent cron RPC stalls with timeout and caching (#13018)
by trevorgordon981 · 2026-02-10
78.7%
#12303: fix(cron): correct nextRunAtMs calculation and prevent timer stall
by colddonkey · 2026-02-09
77.6%
#18144: fix(cron): clear stuck runningAtMs after timeout and add maintenanc...
by taw0002 · 2026-02-16
77.0%
#10829: fix: prevent cron scheduler permanent death on transient startup/ru...
by meaadore1221-afk · 2026-02-07
77.0%
#18743: Cron Tool Hardening: Normalize Gateway Params and Enforce Valid Sch...
by cccat6 · 2026-02-17
76.9%
#13065: fix(cron): Fix "every" schedule not re-arming after gateway restart
by trevorgordon981 · 2026-02-10
76.8%
#20329: Fix cron.run WS blocking and harden delivery recovery
by guirguispierre · 2026-02-18
76.0%
#23562: feat: add sessionFreshness config for isolated cron jobs (#23539)
by MunemHashmi · 2026-02-22
75.6%
#8698: fix(cron): default enabled to true for new jobs
by emmick4 · 2026-02-04
75.5%