← Back to PRs

#11804: fix(webhook): return 503 from health endpoints when last processing failed

by coygeek open 2026-02-08 10:23 View on GitHub →
channel: nextcloud-talk channel: telegram stale
## Fix Summary Track the last webhook processing error in both the Telegram and Nextcloud Talk webhook servers. The `/healthz` endpoint now returns `503 Service Unavailable` with error details when the most recent processing attempt failed, and `200 OK` only when the last processing succeeded (or no processing has occurred yet). This addresses a blind spot where orchestrators (Kubernetes, Docker Swarm) could not detect unhealthy webhook instances because `/healthz` always returned 200, causing user messages to be silently dropped. ## Issue Linkage Fixes #11803 ## Security Snapshot - CVSS v3.1: 7.5 (High) - CVSS v4.0: 8.7 (High) ## Implementation Details ### Files Changed - `extensions/nextcloud-talk/src/monitor.ts` (+12/-1) - `src/telegram/webhook.ts` (+9/-0) ### Technical Analysis - **`src/telegram/webhook.ts`**: Added `lastWebhookError` state tracking. Cleared on successful handler completion, set on handler failure. Health endpoint checks this before responding. - **`extensions/nextcloud-talk/src/monitor.ts`**: Same pattern applied inside `createNextcloudTalkWebhookServer`. Both `onMessage` errors and outer request processing errors update the state. This follows the existing pattern in `src/gateway/server-methods/health.ts` where the gateway health endpoint checks dependencies via `refreshHealthSnapshot()` before responding. ## Validation Evidence - Command: `pnpm build` - Status: passed ## Risk and Compatibility | Change | Risk | Mitigation | |--------|------|------------| | Health returns 503 after processing error | Orchestrators may restart pods more aggressively | Error clears on next successful processing; no false positives on idle bots | | New JSON response body on 503 | Monitoring tools parsing response body | Only applies to 503 path; 200 response unchanged | Fixes #11803 ## AI-Assisted Disclosure This fix was generated with AI assistance (Claude Opus 4.6). --- 🤖 Generated with [Claude Code](https://claude.ai/code) <!-- greptile_comment --> <h2>Greptile Overview</h2> <h3>Greptile Summary</h3> This PR adds `lastWebhookError` tracking to the Telegram and Nextcloud Talk webhook servers and changes their `/healthz` endpoints to return `503` when the last webhook processing attempt failed (and `200` otherwise). This helps orchestrators detect unhealthy webhook instances rather than always reporting OK. The core behavior change is localized to the two webhook servers; no shared health-check infrastructure was modified. <h3>Confidence Score: 4/5</h3> - This PR is close to safe to merge, but the new health responses can leak internal error details. - Change is small and test/build reportedly pass, but returning raw `lastWebhookError` over an unauthenticated health endpoint is a concrete information disclosure footgun in both servers. If error messages can include request/user content or operational details, they become externally visible. - src/telegram/webhook.ts, extensions/nextcloud-talk/src/monitor.ts <!-- greptile_other_comments_section --> <!-- /greptile_comment -->

Most Similar PRs