← Back to PRs

#9049: fix: prevent subagent stuck loops and ensure user feedback

by maxtongwang open 2026-02-04 20:01 View on GitHub →
channel: slack agents stale size: XL
## Problem When a background subagent fails, OpenClaw can get stuck and the user receives no feedback: 1. **Compaction infinite loop:** `handleAutoCompactionEnd()` calls `noteCompactionRetry()` with no upper bound. When pi-ai keeps signaling `willRetry: true`, the `while(true)` run loop in `run.ts` loops indefinitely — the subagent hangs forever and the user's session appears frozen. 2. **Silent announce failure:** When `runSubagentAnnounceFlow()` fails (e.g. gateway timeout, network error), the error is caught and logged but the user never learns the subagent completed. The only recovery is an app restart via `resumeSubagentRun()`. 3. **Context overflow on completion:** When subagents complete, their full output dumps into the parent session, causing context overflow. The 8000 char truncation was a band-aid; multiple subagents still flood the parent. ## Changes ### Fix 1: Compaction retry circuit breaker Add `MAX_COMPACTION_RETRIES = 3`. When `handleAutoCompactionEnd` receives a 4th `willRetry: true`, it sets `compactionRetryExhausted = true`, resets the pending count, and resolves the compaction wait. The main loop in `attempt.ts` then sets a `CompactionRetryExhaustedError` as `promptError`. The outer loop in `run.ts` classifies this as a `compaction_failure` (via `isCompactionFailureError`) and returns a user-facing context overflow error — no further retry. **Files:** `pi-embedded-subscribe.handlers.lifecycle.ts`, `pi-embedded-subscribe.handlers.types.ts`, `pi-embedded-subscribe.ts`, `pi-embedded-runner/run/attempt.ts` ### Fix 2: Announce retry with fallback notification Wrap the announce delivery section in a retry loop (3 attempts, 2s delay between retries). If all attempts fail, send a brief fallback notification to the requester: *"A background task completed but results could not be delivered."* If even the fallback fails, log the error — the existing retry-on-wake path in `finalizeSubagentCleanup` still applies as last resort. **Files:** `subagent-announce.ts` ### Fix 3: Stream subagent progress to dedicated threads Stream progress to dedicated threads (Discord/Slack) instead of dumping full output to parent: - **Progress threads:** Create a progress thread when subagent spawns (Discord/Slack) - **Batched digests:** Queue tool events with debounced batching (3s delay, max 5 tools per digest) - **Brief summaries:** Send only 300 char summary on completion instead of 8000 char full output - **Fallback for non-threaded channels:** Use `[task-label]` prefixed messages with higher debounce (5s) **Files:** `subagent-progress-stream.ts` (new), `subagent-registry.ts`, `subagent-announce.ts`, `slack/send.ts` ### Enhancement: Parent context for subagents Add an optional `context` field to `sessions_spawn`. When provided, it's appended as a `## Background Context` section in the subagent's system prompt. This lets the main agent pass relevant conversation context (user preferences, prior findings) to subagents that would otherwise start with zero context. **Files:** `sessions-spawn-tool.ts`, `subagent-announce.ts` ## Test plan - [x] Extended compaction test: send 4+ `auto_compaction_end` events with `willRetry: true`, verify `isCompactionRetryExhausted()` returns `true` and `waitForCompactionRetry()` resolves - [x] New announce retry tests: mock `callGateway` to fail once then succeed (retry works), fail all 3 attempts (fallback sent), fail everything (returns `false`) - [x] New `buildSubagentSystemPrompt` tests: verify context section presence/absence - [x] New progress stream tests: 26 tests covering state management, batching, thread creation, parallel subagents, edge cases - [x] Updated announce format/retry tests: match new brief summary format - [x] `pnpm build` — compiles cleanly - [x] `pnpm check` — no lint/format errors - [x] `pnpm test` — all tests pass 🤖 Generated with [Claude Code](https://claude.com/claude-code)

Most Similar PRs