← Back to PRs

#17336: fix(gateway): restore device token priority over passive config token

by milosm open 2026-02-15 17:18 View on GitHub →
gateway size: M
Commit d8a2c80cd (v2026.2.14) flipped the token priority in GatewayClient so that `this.opts.token` (which includes passive env/config tokens) always wins over stored device-auth tokens. It also removed the self-healing fallback that clears stale device tokens on connection failure. This broke device auth for clients that have both a stored device token AND a config/env token, particularly on LAN gateways, systemd services with stale env vars, and browsers with cached tokens. The original intent was valid: explicit CLI `--token` should override stored device tokens. The problem is that `this.opts.token` doesn't distinguish explicit CLI flags from passive config/env fallback. Fix: introduce `explicitToken` in GatewayClientOptions. The three-tier priority is now: explicitToken > storedToken > token (passive). Restore `canFallbackToShared` and the `clearDeviceAuthToken` call in the `.catch()` handler for self-healing when a stale device token fails. Update call sites (call.ts, acp/server.ts, tui/gateway-chat.ts) to pass explicit CLI `--token` values as `explicitToken` while keeping env/config tokens as `token`. Fixes #17270 ## Summary - **Problem:** `d8a2c80cd` flipped token priority so passive config/env tokens override stored device tokens, breaking device auth for non-localhost clients - **Why it matters:** Breaks all LAN-bound gateways, systemd installs with env var drift, and browsers with cached device tokens — at least 4 open issues (#16820, #16862, #17223, #17233) - **What changed:** Introduced `explicitToken` field in `GatewayClientOptions` for three-tier priority (`explicitToken > storedToken > token`), restored `canFallbackToShared` self-healing, updated call sites to split explicit CLI `--token` from passive config/env tokens - **What did NOT change:** Server-side auth logic, device pairing flow, token generation, password/trusted-proxy auth modes ## Change Type (select all) - [x] Bug fix - [ ] Feature - [ ] Refactor - [ ] Docs - [ ] Security hardening - [ ] Chore/infra ## Scope (select all touched areas) - [x] Gateway / orchestration - [ ] Skills / tool execution - [x] Auth / tokens - [ ] Memory / storage - [ ] Integrations - [ ] API / contracts - [ ] UI / DX - [ ] CI/CD / infra ## Linked Issue/PR - Closes #17270 - Related #16820, #16862, #17223, #17233 - Alternative fix: #17279 (full revert to v2026.2.13 behavior, also viable) ## User-visible / Behavior Changes - Previously paired devices on LAN/remote connections will authenticate correctly again (restores v2026.2.13 behavior) - Explicit `--token` CLI flag still overrides stored device tokens (preserves v2026.2.14 intent) - Stale device tokens are automatically cleared on connection failure, allowing self-healing recovery on next attempt ## Security Impact (required) - New permissions/capabilities? No - Secrets/tokens handling changed? Yes — token selection priority changed - New/changed network calls? No - Command/tool execution surface changed? No - Data access scope changed? No - If any Yes, explain risk + mitigation: Token priority is restored to the pre-regression order for passive tokens (device token wins over config token). Explicit CLI `--token` still overrides everything, which is the expected security posture — a user deliberately passing a token should win. The self-healing `clearDeviceAuthToken` on failure is also restored from v2026.2.13, which is safe: it only clears the local stored token, forcing re-authentication on the next attempt. ## Repro + Verification ### Environment - OS: Ubuntu 24.04 LTS (Hyper-V VM) - Runtime/container: Node v22.22.0, pnpm - Model/provider: N/A - Integration/channel: Gateway WebSocket (CLI, TUI, Nodes, Web UI) - Relevant config: `gateway.bind=lan`, `gateway.auth.mode=token` ### Steps 1. On v2026.2.13, configure `gateway.bind=lan` and pair a device (Node, CLI, or browser) 2. Verify connection works 3. Upgrade to v2026.2.14 — device connection fails with `unauthorized: device token mismatch` 4. Apply this fix — device connection works again ### Expected - Paired devices authenticate via stored device tokens regardless of config token presence ### Actual - Config token sent instead of device token, causing `unauthorized: device token mismatch` ## Evidence - [x] Failing test/log before + passing after - [ ] Trace/log snippets - [ ] Screenshot/recording - [ ] Perf numbers (if relevant) New test file `src/gateway/client.token-priority.test.ts` (5 tests): - `prefers explicitToken over stored token` ✅ - `prefers stored token over passive token` ✅ (this is the regression case) - `uses passive token when no stored token exists` ✅ - `uses undefined token when nothing is configured` ✅ - `clears stored token on connect failure when shared auth fallback is available` ✅ Existing tests updated and passing: `call.test.ts`, `gateway-chat.test.ts` Full `pnpm test` run across all three suites: - **Gateway suite:** 390 passed, 1 failed (pre-existing: `server-runtime-config.test.ts`) - **Unit suite:** 1064 passed, 9 failed (pre-existing: `browser/server.post-tabs-open-profile-unknown-returns-404.test.ts`, `web/media.test.ts`, `b luebubbles/media-send.test.ts` + heap OOM) - **Integration suite:** 202 passed, 2 failed (pre-existing: `browser/server` + `web/media`) **None of the failures are in files we touched.** Our 31 tests (5 new + 26 updated existing) all pass: ``` ✓ src/gateway/client.token-priority.test.ts (5 tests) 16ms ✓ src/tui/gateway-chat.test.ts (5 tests) 7ms ✓ src/gateway/call.test.ts (21 tests) 24ms ``` ## Human Verification (required) - Verified scenarios: `pnpm build` passes, `pnpm check` (format + lint) passes, all 31 tests in files we touched pass, new token priority tests cover all priority combinations + self-healing - Edge cases checked: No `explicitToken` set (falls through correctly), no `storedToken` (falls through to passive), neither set (undefined), both explicit + stored set (explicit wins) - Pre-existing failures on `main`: 12 test failures across 4 test files, all unrelated to our changes (browser profile CRUD, media local roots, BlueBubbles media send, server-runtime-config) - What you did not verify: Live end-to-end test with actual LAN gateway + paired device (verified via code analysis and unit tests only). Our production instance is on v2026.2.13 rollback. ## Compatibility / Migration - Backward compatible? Yes - Config/env changes? No - Migration needed? No - If yes, exact upgrade steps: N/A — drop-in fix, no config changes needed ## Failure Recovery (if this breaks) - How to disable/revert this change quickly: Revert this single commit; the change is self-contained in `client.ts` + call sites - Files/config to restore: `src/gateway/client.ts`, `src/gateway/call.ts`, `src/acp/server.ts`, `src/tui/gateway-chat.ts` - Known bad symptoms reviewers should watch for: `unauthorized: device token mismatch` errors, or explicit `--token` CLI flag not overriding stored device tokens ## Relationship to #17279 PR #17279 fixes the same regression using **Option A: full revert** — restoring `storedToken ?? this.opts.token` and the self-healing mechanism exactly as they were in v2026.2.13. That's a perfectly viable fix and the simplest path forward. This PR takes **Option B: introduce `explicitToken`** — which also fixes the regression but additionally preserves the intended behavior from `d8a2c80cd`: explicit CLI `--token` flags still override stored device tokens. The tradeoff is a slightly larger change (new field threaded through call sites) for a more precise fix. Either PR resolves #17270 and the related issues. Maintainers should pick whichever approach they prefer. ## Risks and Mitigations - Risk: If a call site passes a passive config token as `explicitToken` by mistake, device tokens would be bypassed for that path - Mitigation: Only two call sites set `explicitToken` (call.ts and acp/server.ts), both only from the explicit CLI `--token` / `opts.gatewayToken` path. All other call sites (runner.ts, probe.ts, discord/exec-approvals.ts) don't set it. --- *AI-assisted: Analysis by Claude Opus 4.6 (high thinking), implementation by Codex 5.3 (high thinking), verification and PR by Claude Opus 4.6. Fully tested. We understand what the code does.* <!-- greptile_comment --> <h3>Greptile Summary</h3> This PR correctly fixes a token priority regression introduced in `d8a2c80cd` where passive config/env tokens were incorrectly overriding stored device tokens, breaking LAN gateways and paired devices. **Key changes:** - Introduces `explicitToken` field in `GatewayClientOptions` to distinguish explicit CLI `--token` flags from passive config/env tokens - Implements three-tier token priority: `explicitToken > storedToken > token` (passive) - Restores `canFallbackToShared` self-healing mechanism that clears stale device tokens on connection failure - Updates call sites (`call.ts`, `acp/server.ts`, `tui/gateway-chat.ts`) to pass CLI tokens as `explicitToken` while keeping config/env tokens as `token` - Adds comprehensive test coverage for all priority scenarios **Implementation quality:** - The logic in `client.ts:194-200` correctly implements the three-tier priority - Self-healing mechanism at `client.ts:279-284` properly restores the v2026.2.13 behavior - All call sites correctly distinguish explicit vs passive tokens - Test coverage is thorough and validates all priority combinations plus self-healing <h3>Confidence Score: 5/5</h3> - This PR is safe to merge with high confidence - it correctly fixes a critical auth regression with excellent test coverage and no security risks. - The implementation is sound: (1) correctly implements three-tier token priority, (2) properly restores self-healing mechanism, (3) all call sites correctly distinguish explicit vs passive tokens, (4) comprehensive test coverage validat...

Most Similar PRs