#17345: feat: Memory kernel rebuild with token budgeting, summary sidecar, and outputRef retrieval

by markmusson open 2026-02-15 17:25 View on GitHub →

docs agents size: XL

## Summary - Problem: OpenClaw context assembly relied on turn-count slicing and late compaction, which caused context bloat, unstable continuity, and expensive cache write spikes in long sessions. - Why it matters: This affects reliability (summary loops/orphans), memory quality, and token/cost efficiency in real agent sessions. - What changed: - Added deterministic token-budget history planning in the PI embedded runner. - Capped injected memory retrieval payloads (`maxInjectedChars`, default 4000). - Added incremental session-summary sidecar state with loop/rewind hardening and context-pressure-based injection. - Externalized oversized tool outputs into `tool-output/*.json` with transcript `details.outputRef` pointers. - Added on-demand `outputRef` payload retrieval to `sessions_history` (`outputRefPath`, `outputRefMaxChars`) with path scoping and SHA-256 verification. - Added technical design + implementation report in docs. - What did NOT change (scope boundary): No provider/model contract changes, no migration of existing memory stores, and no change to default session storage locations. ## Change Type (select all) - [x] Bug fix - [x] Feature - [x] Refactor - [x] Docs - [x] Security hardening - [ ] Chore/infra ## Scope (select all touched areas) - [x] Gateway / orchestration - [x] Skills / tool execution - [ ] Auth / tokens - [x] Memory / storage - [ ] Integrations - [x] API / contracts - [ ] UI / DX - [ ] CI/CD / infra ## Linked Issue/PR - Closes #N/A - Related #17345 ## User-visible / Behavior Changes - `sessions_history` now preserves `toolResult.details.outputRef` metadata when history is sanitized. - `sessions_history` now supports optional on-demand payload retrieval: - `outputRefPath` - `outputRefMaxChars` - Large tool outputs are persisted as file refs instead of bloating transcript message content. ## Security Impact (required) - New permissions/capabilities? (`Yes/No`): Yes - Secrets/tokens handling changed? (`Yes/No`): No - New/changed network calls? (`Yes/No`): No - Command/tool execution surface changed? (`Yes/No`): Yes - Data access scope changed? (`Yes/No`): Yes - If any `Yes`, explain risk + mitigation: - Risk: payload file read path traversal via `outputRefPath`. - Mitigation: retrieval is constrained to the session-local `tool-output/` subtree and rejects paths outside that directory. ## Repro + Verification ### Environment - OS: macOS (Apple Silicon) - Runtime/container: Node v22.22.0, pnpm 10.23.0 - Model/provider: openai-codex/gpt-5.3-codex (local UAT) - Integration/channel (if any): local gateway + TUI - Relevant config (redacted): default local `.openclaw` profile ### Steps 1. Run long chat/tool sessions and trigger memory/context pressure. 2. Execute large-output tool command (e.g. `python - <<'PY'\nprint('x'*150000)\nPY`). 3. Verify `toolResult.details.outputRef` appears in transcript and payload is written under `sessions/tool-output/`. 4. Call `sessions_history` with `outputRefPath` to fetch truncated full payload. ### Expected - History is token-budgeted deterministically. - Summary sidecar persists without recursive `[SESSION_SUMMARY]` contamination. - Large tool output is externalized; transcript remains compact. - `sessions_history` can retrieve referenced payload safely with hash verification. ### Actual - Matches expected behavior in local UAT and e2e coverage. ## Evidence Attach at least one: - [x] Failing test/log before + passing after - [x] Trace/log snippets - [ ] Screenshot/recording - [x] Perf numbers (if relevant) ## Human Verification (required) What you personally verified (not just CI), and how: - Verified scenarios: - UAT session `agent:main:uat-summary-fix` confirms sidecar summary behavior and loop fix. - UAT session `agent:main:uat-p5-tool-output-ref` confirms output ref persistence for oversized tool output. - On-demand retrieval path via `sessions_history outputRefPath` validated with new e2e coverage. - Edge cases checked: - static budget overflow fallback - single-turn oversize preservation - transcript rewind recovery - output ref path containment + missing ref handling - What you did **not** verify: - full cross-provider cost benchmark report (follow-up item) ## Compatibility / Migration - Backward compatible? (`Yes/No`): Yes - Config/env changes? (`Yes/No`): No (new tool params are optional) - Migration needed? (`Yes/No`): No - If yes, exact upgrade steps: N/A ## Failure Recovery (if this breaks) - How to disable/revert this change quickly: - Revert commits `312f3401e` and `f43c5f41f` for tool-output ref pathing. - Revert commits `827d3e48c`, `b9490e95e`, `9642a3a6a`, `e22c86b41` for summary sidecar path. - Revert commit `84e694680` for context planner. - Files/config to restore: - `src/agents/pi-embedded-runner/run/attempt.ts` - `src/agents/pi-embedded-runner/context-planner.ts` - `src/agents/pi-embedded-runner/session-summary.ts` - `src/agents/session-tool-result-guard.ts` - `src/agents/tools/sessions-history-tool.ts` - Known bad symptoms reviewers should watch for: - repeated `[SESSION_SUMMARY]` recursion in prompts - missing/invalid `toolResult.details.outputRef` for large outputs - context overflow regressions in long sessions ## Risks and Mitigations - Risk: token estimator fallback (`chars/3.6`) can under/over-estimate for some content mixes. - Mitigation: planner uses safety ratio and reserve buffers; e2e coverage verifies hard fallback behavior. - Risk: additional file I/O for output payload retrieval. - Mitigation: on-demand path only, hard character cap, and optional usage. ### Validation Commands - `pnpm tsgo` - `pnpm test src/agents/pi-embedded-runner/context-planner.test.ts src/agents/pi-embedded-runner/session-summary.test.ts` - `pnpm exec vitest run -c vitest.e2e.config.ts src/agents/pi-embedded-runner.limithistoryturns.e2e.test.ts src/agents/pi-embedded-runner/run/attempt.e2e.test.ts src/agents/memory-search.e2e.test.ts src/agents/tools/memory-tool.citations.e2e.test.ts src/agents/session-tool-result-guard.e2e.test.ts src/agents/session-tool-result-guard.tool-result-persist-hook.e2e.test.ts` - `pnpm exec vitest run -c vitest.e2e.config.ts src/agents/openclaw-tools.sessions.e2e.test.ts -t sessions_history` - `pnpm exec vitest run -c vitest.e2e.config.ts src/agents/tool-display.e2e.test.ts` ### Commit Set - `84e694680` Runner: add token-budget context planning - `520998e5a` Memory: cap injected recall snippets - `e22c86b41` Runner: persist incremental session summary state - `9642a3a6a` Runner: prevent session-summary prompt feedback loops - `b9490e95e` Runner: recover session summary after transcript rewinds - `827d3e48c` Runner: inject session summary only under context pressure - `f43c5f41f` Session: persist oversized tool payloads as file refs - `812107d5f` Docs: add memory kernel design and verification report - `312f3401e` Sessions: add on-demand outputRef payload retrieval - `d5db9792c` Docs: update memory kernel report for outputRef retrieval AI-assisted: Yes (Codex); verified locally with targeted UAT and e2e suites. Agent-Signoff: Creash-the-Lobster  <h3>Greptile Summary</h3> This PR replaces turn-count-based context slicing with a deterministic token-budget context planner for the PI embedded runner, adds an incremental session-summary sidecar with loop/rewind hardening, externalizes oversized tool outputs to `tool-output/*.json` files with transcript `outputRef` pointers, and introduces on-demand payload retrieval via `sessions_history`. - **Token-budget context planner** (`context-planner.ts`): Well-structured module that estimates token costs per message and trims oldest messages to fit within a computed history budget, always preserving the latest user turn. Integrates cleanly with the existing `limitHistoryTurns` and `sanitizeToolUseResultPairing` pipeline. - **Session summary sidecar** (`session-summary.ts`): Persists incremental summary state alongside session files, with rewind detection and `[SESSION_SUMMARY]` loop prevention. Summary injection is gated on context pressure or trimming, avoiding unnecessary prompt bloat. Minor dedup bug identified (consecutive duplicate detection only checks against prior state, not accumulated additions). - **Output ref externalization** (`session-tool-result-guard.ts`): Large tool results (>120K chars text or >24K chars details) are written to separate JSON files with SHA-256 hashes. Uses synchronous file I/O within the `appendMessage` hot path due to interface constraints — a deviation from the codebase's async I/O patterns. - **On-demand retrieval** (`sessions-history-tool.ts`): New `outputRefPath`/`outputRefMaxChars` parameters allow fetching externalized payloads. Path containment is enforced via `path.resolve` prefix check (lexical only — does not resolve symlinks). SHA-256 verification is informational; mismatched hashes don't prevent payload delivery. - **Memory search cap** (`memory-search.ts`, `memory-tool.ts`): New `maxInjectedChars` config (default 4000) caps total snippet characters injected per `memory_search` call, applied uniformly across backends. - Config, schema, and display updates are consistent and minimal. Test coverage is solid with both unit and e2e tests for all new features. <h3>Confidence Score: 4/5</h3> - This PR is safe to merge with minor issues flagged; no critical bugs or security vulnerabilities found. - The implementation is well-structured with thorough test coverage (unit + e2e), defensive error handling, and clear separation of concerns. The issues found are: (1) a minor dedup logic bug in session summary that only affects unlikely consecutive identical messages, (2) a style concern about sync I/O on the hot path with documented constraints, (3) defense-in-depth suggestions for path containment and hash verification. None are blocking. - `src/agents/tools/sessions-history-tool.ts` (path containment and hash verifi...