#11250: fix: expand skills watcher ignore list and improve session repair logging
agents
stale
size: S
Cluster:
Skill Enhancements and Fixes
Fixes #11181, #11187
## Changes
### #11181: File Descriptor Leak in Skills Watcher
**Problem**: The skills watcher was monitoring all files in `workspace/skills/`, including Python virtual environments (`venv/`, `.venv/`), caches (`__pycache__/`), and Node.js package manager caches (`.npm/`, `.yarn/`, `.pnpm/`). This caused file descriptor leaks when workspaces contained large dependency trees (6000+ files), leading to `spawn EBADF` errors that broke all `exec` tool calls.
**Solution**: Expanded `DEFAULT_SKILLS_WATCH_IGNORED` in `src/agents/skills/refresh.ts` to ignore common dependency and build directories:
- Python: `venv/`, `.venv/`, `__pycache__/`, `.pytest_cache/`, `.mypy_cache/`, `.hypothesis/`
- Node.js: `.npm/`, `.yarn/`, `.pnpm/`
- Rust: `target/`
- Generic: `build/`
- Frameworks: `.next/` (Next.js), `.nuxt/` (Nuxt.js)
### #11187: Session Corruption on Gateway Restart
**Problem**: When the gateway restarts via `exec` tool (SIGUSR1), the process is killed before `toolResult` can be written to the session JSONL file, creating orphaned `toolCall` entries. On next boot, the API rejects these orphaned entries, triggering an infinite cooldown loop.
**Solution**: The `repairToolUseResultPairing` function already existed to detect and repair orphaned toolCalls by inserting synthetic error results. This change adds comprehensive logging to help users understand when and why session repairs occur:
- **Warning log**: When orphaned toolCalls are detected (explains the likely cause: gateway restart)
- **Info log**: When repair is complete (summarizes added results, dropped duplicates, dropped orphans, moved items)
## Testing
- All existing tests pass (73 tests in session-transcript-repair suite)
- Manual testing: Verified ignore patterns match common dependency directories
- No breaking changes to existing behavior
## Impact
- **#11181**: Prevents file descriptor exhaustion in workspaces with Python skills, reducing FD leak from 2500+ per message to near zero
- **#11187**: Improves debugging by providing clear visibility into session repair operations
<!-- greptile_comment -->
<h2>Greptile Overview</h2>
<h3>Greptile Summary</h3>
This PR expands the skills file watcher ignore list to skip common dependency/build directories (e.g., Python venv/caches, package manager caches, and framework build outputs) to avoid file descriptor exhaustion when watching large skill trees.
It also adds additional logging around `repairToolUseResultPairing` so users can see when session transcripts are repaired (e.g., orphaned tool calls after a gateway restart) and what changes were applied.
<h3>Confidence Score: 3/5</h3>
- Mostly safe, but the new logging is unlikely to reliably capture restart-related repairs and may spam duplicate logs.
- The ignore-list changes are low-risk and isolated. The session-repair logging introduces a detached async import/logger call plus unconditional console logging, which can drop the very logs needed during fast exits and duplicate output in normal runs. Fixing logging reliability/duplication would make this safer to merge.
- src/agents/session-transcript-repair.ts
<!-- greptile_other_comments_section -->
<sub>(2/5) Greptile learns from your feedback when you react with thumbs up/down!</sub>
<!-- /greptile_comment -->
Most Similar PRs
#14023: fix: filter skills watcher to relevant file types to prevent FD exh...
by funmerlin · 2026-02-11
84.0%
#10016: fix: prevent FD exhaustion from skill watcher scanning artifact trees
by oldeucryptoboi · 2026-02-06
83.4%
#8291: Fix: Add Python virtual environment ignore patterns to skills watcher
by vishaltandale00 · 2026-02-03
82.2%
#22525: [Bug]: Session snapshot not reloading skills after gateway restart ...
by zwffff · 2026-02-21
81.9%
#12076: fix(skills): recursive directory filtering to actually exclude venv...
by xiaoyaner0201 · 2026-02-08
81.7%
#23749: fix some issues
by tronpis · 2026-02-22
81.6%
#15050: fix: transcript corruption resilience — strip aborted tool_use bloc...
by yashchitneni · 2026-02-12
81.3%
#9595: fix(skills): ignore .venv, __pycache__, and .openclaw to prevent FD...
by amoghacloud · 2026-02-05
81.0%
#22568: fix(gateway): bump skills snapshot version on startup so sessions r...
by zwffff · 2026-02-21
80.9%
#16654: fix: refresh skills snapshot when managed skills change
by PhineasFleabottom · 2026-02-15
80.3%