← Back to PRs

#11250: fix: expand skills watcher ignore list and improve session repair logging

by zhangzhefang-github open 2026-02-07 16:26 View on GitHub →
agents stale size: S
Fixes #11181, #11187 ## Changes ### #11181: File Descriptor Leak in Skills Watcher **Problem**: The skills watcher was monitoring all files in `workspace/skills/`, including Python virtual environments (`venv/`, `.venv/`), caches (`__pycache__/`), and Node.js package manager caches (`.npm/`, `.yarn/`, `.pnpm/`). This caused file descriptor leaks when workspaces contained large dependency trees (6000+ files), leading to `spawn EBADF` errors that broke all `exec` tool calls. **Solution**: Expanded `DEFAULT_SKILLS_WATCH_IGNORED` in `src/agents/skills/refresh.ts` to ignore common dependency and build directories: - Python: `venv/`, `.venv/`, `__pycache__/`, `.pytest_cache/`, `.mypy_cache/`, `.hypothesis/` - Node.js: `.npm/`, `.yarn/`, `.pnpm/` - Rust: `target/` - Generic: `build/` - Frameworks: `.next/` (Next.js), `.nuxt/` (Nuxt.js) ### #11187: Session Corruption on Gateway Restart **Problem**: When the gateway restarts via `exec` tool (SIGUSR1), the process is killed before `toolResult` can be written to the session JSONL file, creating orphaned `toolCall` entries. On next boot, the API rejects these orphaned entries, triggering an infinite cooldown loop. **Solution**: The `repairToolUseResultPairing` function already existed to detect and repair orphaned toolCalls by inserting synthetic error results. This change adds comprehensive logging to help users understand when and why session repairs occur: - **Warning log**: When orphaned toolCalls are detected (explains the likely cause: gateway restart) - **Info log**: When repair is complete (summarizes added results, dropped duplicates, dropped orphans, moved items) ## Testing - All existing tests pass (73 tests in session-transcript-repair suite) - Manual testing: Verified ignore patterns match common dependency directories - No breaking changes to existing behavior ## Impact - **#11181**: Prevents file descriptor exhaustion in workspaces with Python skills, reducing FD leak from 2500+ per message to near zero - **#11187**: Improves debugging by providing clear visibility into session repair operations <!-- greptile_comment --> <h2>Greptile Overview</h2> <h3>Greptile Summary</h3> This PR expands the skills file watcher ignore list to skip common dependency/build directories (e.g., Python venv/caches, package manager caches, and framework build outputs) to avoid file descriptor exhaustion when watching large skill trees. It also adds additional logging around `repairToolUseResultPairing` so users can see when session transcripts are repaired (e.g., orphaned tool calls after a gateway restart) and what changes were applied. <h3>Confidence Score: 3/5</h3> - Mostly safe, but the new logging is unlikely to reliably capture restart-related repairs and may spam duplicate logs. - The ignore-list changes are low-risk and isolated. The session-repair logging introduces a detached async import/logger call plus unconditional console logging, which can drop the very logs needed during fast exits and duplicate output in normal runs. Fixing logging reliability/duplication would make this safer to merge. - src/agents/session-transcript-repair.ts <!-- greptile_other_comments_section --> <sub>(2/5) Greptile learns from your feedback when you react with thumbs up/down!</sub> <!-- /greptile_comment -->

Most Similar PRs