#19670: fix(config): guard config.apply against catastrophic key loss

by nabbilkhan open 2026-02-18 02:40 View on GitHub →

app: macos app: web-ui gateway agents size: S

## Summary - **Problem**: `config.apply` replaces the entire config file with whatever the caller sends, with no check that the incoming payload preserves the existing structure. A 65-byte stub can silently overwrite a 4 KB production config (8 agents, all channel configs, auth profiles), leaving the gateway in a crash-loop with no automatic recovery path. - **Root cause**: `src/gateway/server-methods/config.ts` — the handler validates params, verifies the base hash, parses the incoming payload, then immediately calls `writeConfigFile()`. There is no semantic check comparing the incoming top-level sections to the existing ones before writing. - **What changed**: Before `writeConfigFile()`, the handler now computes which existing top-level sections would be dropped. If more than 50% would be dropped and `allowDestructive: true` is not present in the request, the write is rejected with an error that names exactly which sections are at risk and how to proceed. - **What did NOT change**: `config.patch`, `config.set`, write audit in `src/config/io.ts`, config schema validation, restart sentinel logic — none touched. ## Change Type - [x] Bug fix - [x] Security hardening ## Scope - [x] Gateway / orchestration - [x] API / contracts ## Linked Issue/PR - Closes #6395 ## User-visible / Behavior Changes `config.apply` now rejects requests that would drop more than half of the existing top-level config sections, unless `allowDestructive: true` is included in the request params. The error message is explicit: ``` config.apply would drop 5 of 6 top-level sections: [agents, channels, auth, session, memory]. Pass allowDestructive: true to confirm this is intentional. ``` Clients that legitimately want a clean-slate replacement pass `allowDestructive: true` — one explicit field, no ambiguity. ## Security Impact - New permissions/capabilities? No - Secrets/tokens handling changed? No - New/changed network calls? No - Command/tool execution surface changed? No - Data access scope changed? No ## Repro + Verification ### Environment - OS: Ubuntu 24.04 LTS - Runtime/container: Node 22.x - Model/provider: N/A (gateway config path) - Integration/channel: N/A - Relevant config: production-style config with 6 top-level sections ### Steps 1. Start gateway with a production config containing `gateway`, `agents`, `channels`, `auth`, `session`, `memory` top-level keys. 2. Call `config.apply` with `raw: '{"gateway":{"mode":"local"}}'` (65 bytes, drops 5 of 6 sections). 3. **Before fix**: config is overwritten, gateway crash-loops on restart. 4. **After fix**: request is rejected with a descriptive error; original config is untouched. ### Expected - Stub apply rejected with error naming dropped sections + `allowDestructive` escape hatch. ### Actual (before fix) - Config silently overwritten. Gateway restart fails. Manual recovery required. ## Evidence - [x] Failing test before fix + passing after 5 new test cases, all passing: 1. **Happy path** — full config applied → `ok: true` 2. **Partial update** — drops <50% of sections → `ok: true` 3. **Destructive guard triggered** — drops >50% of sections → `ok: false`, error names dropped sections and mentions `allowDestructive` 4. **`allowDestructive` bypass** — same stub with `allowDestructive: true` → `ok: true` 5. **Boundary** — exactly 50% dropped → `ok: true` (guard fires at strictly >50%) All 7 tests in `server.config-apply.e2e.test.ts` pass. All 4,847 existing tests pass (1 pre-existing failure in `src/browser/server.post-tabs-open-profile-unknown-returns-404.test.ts` exists on `main` before this PR). ## Human Verification - Ran full `pnpm test` suite — 595 test files, 4,847 tests passing. - Ran targeted `pnpm vitest run --config vitest.e2e.config.ts src/gateway/server.config-apply.e2e.test.ts` — 7/7 passing. - Verified `oxlint` reports 0 warnings, 0 errors on the 3 changed files. - Verified the destructive guard fires correctly at >50%, not >=50%. - Verified `allowDestructive: true` correctly bypasses the guard. - Verified error message names dropped sections by name. - Did NOT verify: behavior when config file is empty or malformed before apply (existing `requireConfigBaseHash` logic handles that path). ## Compatibility / Migration - Backward compatible? Yes — `allowDestructive` is optional; existing clients that apply valid full configs are unaffected. - Config/env changes? No - Migration needed? No ## Failure Recovery - How to disable: revert this commit or pass `allowDestructive: true` in the request. - Files to restore: `src/gateway/server-methods/config.ts`, `src/gateway/protocol/schema/config.ts`. - Known bad symptoms: none expected; if the guard fires incorrectly, the error message names the sections and the caller can add `allowDestructive: true`. ## Risks and Mitigations - Risk: Legitimate clean-slate config replacement is now blocked by default. - Mitigation: `allowDestructive: true` escape hatch is explicit, documented in the error message, and validated by the TypeBox schema. --- **AI disclosure**: This PR was developed with AI assistance (Claude). Logic, tests, and descriptions reviewed for correctness. Fully tested — all 4,847 existing tests pass, plus 5 new test cases covering every branch of the guard.  <h3>Greptile Summary</h3> adds a guard to `config.apply` that prevents catastrophic data loss by rejecting requests that would drop more than 50% of existing top-level config sections ## Changes - added `allowDestructive` parameter to `ConfigApplyParamsSchema` (src/gateway/protocol/schema/config.ts:32) - guard logic checks dropped sections before `writeConfigFile()` (src/gateway/server-methods/config.ts:396-413) - 5 new test cases verify guard behavior across threshold boundaries (src/gateway/server.config-apply.e2e.test.ts:90-223) ## Key behavior - guard only fires when config exists and is an object - threshold: `droppedKeys.length > existingKeys.length / 2` (strictly greater than 50%) - escape hatch: passing `allowDestructive: true` bypasses guard - error message names dropped sections and hints at `allowDestructive` ## Issues - test comment on line 117 incorrectly says 50% case "must be blocked" when it actually passes (test itself is correct) <h3>Confidence Score: 5/5</h3> - safe to merge — solves critical data loss issue with minimal code change and comprehensive tests - guard logic is simple and correct, escape hatch is explicit, all edge cases are tested including boundary conditions, backward compatible for legitimate use cases - no files require special attention — one minor comment clarity issue in test file <sub>Last reviewed commit: 329a382</sub>