← Back to PRs

#13928: Classify session lock timeouts separately and improve lock diagnostics

by kirillsaven open 2026-02-11 07:15 View on GitHub →
agents stale
## Summary This PR improves reliability diagnostics around session lock contention and prevents lock-contention errors from being misclassified as provider/model failover. ## AI disclosure - AI-assisted: **Yes** (OpenClaw + Codex). - Testing level: **Fully tested for touched paths** (36 targeted tests passed). - Prompts/session logs: available on request (sanitized excerpts can be shared). - Code understanding: confirmed; changes were reviewed manually before submission. - Guide code word: **lobster-biscuit**. ### What changed - Added structured lock-timeout errors: - `SessionFileLockTimeoutError` (`SESSION_FILE_LOCK_TIMEOUT`) - `SessionStoreLockTimeoutError` (`SESSION_STORE_LOCK_TIMEOUT`) - Added owner diagnostics to timeout messages: - `owner_alive=0|1` - `owner_age_ms=<n>` - Updated failover classification to **not** treat session lock timeout errors as provider failover signals. - Added regression tests: - failover classification excludes lock-timeout errors - model fallback does not continue on lock-timeout errors - session write lock timeout exposes structured diagnostics ## Why Previously, lock contention could appear in `All models failed ...` summaries together with provider cooldown/rate-limit failures. This mixed local lock failures with provider state and made incidents harder to triage. ## Scope / risk - Small and localized. - No lock acquisition semantics changed. - No config or migration changes. ## Test plan ```bash pnpm vitest run src/agents/failover-error.test.ts src/agents/model-fallback.test.ts src/agents/session-write-lock.test.ts ``` Result on local run: all tests passed (36/36).

Most Similar PRs