#8713: feat: gateway memory monitor, install linger, docs and failover

by quratus open 2026-02-04 09:27 View on GitHub →

docs channel: telegram app: web-ui gateway cli agents stale

Cluster: Memory Optimization and Gateway Enhancements

## Summary - **Memory monitor**: Gateway memory monitor with configurable thresholds (env: `OPENCLAW_MEMORY_*`), warning/critical/fatal handling, aggressive cleanup (dedupe, chat run buffers, aborted runs). - **Gateway install**: Enable systemd user linger on Linux so gateway survives logout; docs (cron, gateway, environment, troubleshooting). - **Failover / heartbeat / auto-reply**: Pi-embedded error classification, gateway tool hints, dispatch and followup-runner tweaks, command-queue improvements. - **Project docs**: `agent_learnings.md`, `chatroom.md`. ## Commits 1. feat: gateway memory monitor and graceful degradation 2. chore: gateway install linger, docs, failover/heartbeat and auto-reply  <h2>Greptile Overview</h2> <h3>Greptile Summary</h3> This PR adds a gateway memory monitor with env-configurable thresholds and aggressive cleanup hooks, updates gateway install/docs (systemd linger, cron, environment/troubleshooting), and tweaks failover/heartbeat/auto-reply behaviors and command queue handling. Main issues found are (1) accidental debug/telemetry instrumentation added across multiple hot paths (auto-reply dispatch, gateway chat, heartbeat, command queue, TUI), including hardcoded local file paths and localhost HTTP ingest; and (2) inverted fallback attribution logic that drops provider/model fields when a fallback is actually used. There is also a UI behavior change that refreshes sessions more broadly on chat final events, which may add unnecessary reloads. <h3>Confidence Score: 2/5</h3> - Not safe to merge as-is due to debug telemetry/file I/O introduced in production code paths. - Multiple files introduce hardcoded debug logging that performs network calls and writes to a developer-specific path, which is likely accidental and would affect runtime behavior/privacy. Separately, the new fallback-tracking logic appears inverted and would drop provider/model attribution for fallback runs, impacting correctness of usage tracking. - src/auto-reply/dispatch.ts; src/auto-reply/reply/dispatch-from-config.ts; src/gateway/server-methods/chat.ts; src/infra/heartbeat-runner.ts; src/process/command-queue.ts; src/tui/tui-command-handlers.ts; src/auto-reply/reply/agent-runner.ts; src/auto-reply/reply/followup-runner.ts  <sub>(2/5) Greptile learns from your feedback when you react with thumbs up/down!</sub> <details><summary><h4>Context used (3)</h4></summary> - Context from `dashboard` - CLAUDE.md ([source](https://app.greptile.com/review/custom-context?memory=fd949e91-5c3a-4ab5-90a1-cbe184fd6ce8)) - Context from `dashboard` - AGENTS.md ([source](https://app.greptile.com/review/custom-context?memory=0d0c8278-ef8e-4d6c-ab21-f5527e322f13)) - Context from `dashboard` - docs/cli/agents.md ([source](https://app.greptile.com/review/custom-context?memory=057a11aa-5c5f-48bb-8d53-91b27b0fe3a2)) </details>