#11821: fix(auth): trigger failover on 401 status code from expired OAuth tokens
docs
agents
stale
Cluster:
Model Fallback and Error Handling
#### Summary
When an OAuth/claude-token setup token expires, the bot receives a 401 `authentication_error` and crashes instead of failing over to the configured direct API key. The failover logic for prompt errors relied solely on string matching (`classifyFailoverReason(errorText)`) which can miss 401 errors when the HTTP status code is only on the error object, not in the message text.
Closes #11674
lobster-biscuit
#### Repro Steps
1. Configure both `anthropic:claude-token` (OAuth) and `anthropic:default` (API key) auth profiles
2. Wait for the OAuth token to expire
3. Send a message — the bot crashes with a raw `authentication_error` instead of falling back to the API key
#### Root Cause
The prompt error handling path in `src/agents/pi-embedded-runner/run.ts` used `classifyFailoverReason(errorText)` which only matches against known string patterns (e.g., "authentication", "401", "unauthorized"). If the error text doesn't contain these exact strings but the HTTP error object has `status: 401`, the failover is not triggered.
The fix: use `resolveFailoverReasonFromError(promptError)` which checks the error object's `status`/`statusCode` property first (catching 401, 402, 403, 408, 429), then falls back to string matching. This mirrors what already works for `resolveFailoverReasonFromError` in the model fallback path.
#### Behavior Changes
- Prompt errors with HTTP 401/403 status codes now trigger auth profile failover, even if the error message text doesn't match known patterns
- The model fallback throw path also uses the improved status-code-aware reason detection
- No change to behavior when string patterns already match (backward compatible)
#### Codebase and GitHub Search
- Reviewed `resolveFailoverReasonFromError` in `src/agents/failover-error.ts` — already handles 401/403 via status code
- Reviewed `classifyFailoverReason` in `src/agents/pi-embedded-helpers/errors.ts` — string-only matching
- Confirmed `getStatusCode` extracts `status`/`statusCode` from error objects
- No existing PRs for #11674
#### Tests
- Added 401 status test to `src/agents/failover-error.test.ts` (now 7 tests, all pass)
- Added `coerces 401 auth errors with status code even without matching message text` test
- Updated mock in `run.overflow-compaction.test.ts` to include new import
- All 82 pi-embedded-runner + failover tests pass
**Sign-Off**
- Models used: Claude (AI-assisted)
- Submitter effort: Traced error flow through prompt handling, identified string-only matching gap, implemented status-code-aware fix
- Agent notes: The `resolveFailoverReasonFromError` function already existed and handled 401 correctly — the bug was that the prompt error path wasn't using it. Minimal change: 3 lines of logic + 1 import.
Made with [Cursor](https://cursor.com)
<!-- greptile_comment -->
<h2>Greptile Overview</h2>
<h3>Greptile Summary</h3>
This PR fixes auth-profile failover for prompt submission errors when expired OAuth tokens return an HTTP 401/403 on the error object but the message text doesn’t contain recognizable auth keywords. It does this by using `resolveFailoverReasonFromError(promptError)` (status-code-aware) in `src/agents/pi-embedded-runner/run.ts`, falling back to the existing string-based `classifyFailoverReason(errorText)`, and adds a regression test in `src/agents/failover-error.test.ts` to cover 401 status handling.
The changes integrate with the existing failover infrastructure (`FailoverError`, `resolveFailoverStatus`, auth profile rotation) by ensuring prompt-error handling uses the same error-object inspection logic already used elsewhere, so configured direct API key fallbacks can activate when OAuth tokens expire.
<h3>Confidence Score: 3/5</h3>
- This PR is close to mergeable but has a logic guard that can trigger failover on non-failover errors.
- The intended fix (status-code-aware failover classification) is sound and covered by tests, but the new `promptFailoverReason !== null` checks can evaluate true for `undefined`, which changes control flow and can cause incorrect auth rotation / FailoverError throwing. After tightening that guard, risk should be low.
- src/agents/pi-embedded-runner/run.ts
<!-- greptile_other_comments_section -->
<!-- /greptile_comment -->
Most Similar PRs
#10178: fix: trigger fallback when model resolution fails with unknown model
by Yida-Dev · 2026-02-06
81.6%
#7229: fix: add network error resilience to agentic loop failover
by ai-fanatic · 2026-02-02
80.1%
#19267: fix: derive failover reason from timedOut flag to prevent unknown c...
by austenstone · 2026-02-17
80.0%
#21017: fix: treat HTTP 502/503/504 as failover-eligible (timeout reason)
by taw0002 · 2026-02-19
79.8%
#12314: fix: treat HTTP 5xx server errors as failover-worthy
by hsssgdtc · 2026-02-09
79.8%
#4097: fix: classify AWS SSO token errors as auth for model fallback (AI-a...
by guyelia · 2026-01-29
79.3%
#2123: fix(auth): sync from Claude CLI keychain before OAuth refresh
by jorge123255 · 2026-01-26
79.1%
#9427: fix: trigger model fallback on all 4xx HTTP errors
by dbottme · 2026-02-05
78.8%
#15815: Fallback LLM doesn't trigger if primary model is local
by shihanqu · 2026-02-13
78.3%
#21491: fix: classify Google 503 UNAVAILABLE as transient failover [AI-assi...
by ZPTDclaw · 2026-02-20
78.0%