#17345: feat: Memory kernel rebuild with token budgeting, summary sidecar, and outputRef retrieval
docs
agents
size: XL
Cluster:
Memory Management Enhancements
## Summary
- Problem: OpenClaw context assembly relied on turn-count slicing and late compaction, which caused context bloat, unstable continuity, and expensive cache write spikes in long sessions.
- Why it matters: This affects reliability (summary loops/orphans), memory quality, and token/cost efficiency in real agent sessions.
- What changed:
- Added deterministic token-budget history planning in the PI embedded runner.
- Capped injected memory retrieval payloads (`maxInjectedChars`, default 4000).
- Added incremental session-summary sidecar state with loop/rewind hardening and context-pressure-based injection.
- Externalized oversized tool outputs into `tool-output/*.json` with transcript `details.outputRef` pointers.
- Added on-demand `outputRef` payload retrieval to `sessions_history` (`outputRefPath`, `outputRefMaxChars`) with path scoping and SHA-256 verification.
- Added technical design + implementation report in docs.
- What did NOT change (scope boundary): No provider/model contract changes, no migration of existing memory stores, and no change to default session storage locations.
## Change Type (select all)
- [x] Bug fix
- [x] Feature
- [x] Refactor
- [x] Docs
- [x] Security hardening
- [ ] Chore/infra
## Scope (select all touched areas)
- [x] Gateway / orchestration
- [x] Skills / tool execution
- [ ] Auth / tokens
- [x] Memory / storage
- [ ] Integrations
- [x] API / contracts
- [ ] UI / DX
- [ ] CI/CD / infra
## Linked Issue/PR
- Closes #N/A
- Related #17345
## User-visible / Behavior Changes
- `sessions_history` now preserves `toolResult.details.outputRef` metadata when history is sanitized.
- `sessions_history` now supports optional on-demand payload retrieval:
- `outputRefPath`
- `outputRefMaxChars`
- Large tool outputs are persisted as file refs instead of bloating transcript message content.
## Security Impact (required)
- New permissions/capabilities? (`Yes/No`): Yes
- Secrets/tokens handling changed? (`Yes/No`): No
- New/changed network calls? (`Yes/No`): No
- Command/tool execution surface changed? (`Yes/No`): Yes
- Data access scope changed? (`Yes/No`): Yes
- If any `Yes`, explain risk + mitigation:
- Risk: payload file read path traversal via `outputRefPath`.
- Mitigation: retrieval is constrained to the session-local `tool-output/` subtree and rejects paths outside that directory.
## Repro + Verification
### Environment
- OS: macOS (Apple Silicon)
- Runtime/container: Node v22.22.0, pnpm 10.23.0
- Model/provider: openai-codex/gpt-5.3-codex (local UAT)
- Integration/channel (if any): local gateway + TUI
- Relevant config (redacted): default local `.openclaw` profile
### Steps
1. Run long chat/tool sessions and trigger memory/context pressure.
2. Execute large-output tool command (e.g. `python - <<'PY'\nprint('x'*150000)\nPY`).
3. Verify `toolResult.details.outputRef` appears in transcript and payload is written under `sessions/tool-output/`.
4. Call `sessions_history` with `outputRefPath` to fetch truncated full payload.
### Expected
- History is token-budgeted deterministically.
- Summary sidecar persists without recursive `[SESSION_SUMMARY]` contamination.
- Large tool output is externalized; transcript remains compact.
- `sessions_history` can retrieve referenced payload safely with hash verification.
### Actual
- Matches expected behavior in local UAT and e2e coverage.
## Evidence
Attach at least one:
- [x] Failing test/log before + passing after
- [x] Trace/log snippets
- [ ] Screenshot/recording
- [x] Perf numbers (if relevant)
## Human Verification (required)
What you personally verified (not just CI), and how:
- Verified scenarios:
- UAT session `agent:main:uat-summary-fix` confirms sidecar summary behavior and loop fix.
- UAT session `agent:main:uat-p5-tool-output-ref` confirms output ref persistence for oversized tool output.
- On-demand retrieval path via `sessions_history outputRefPath` validated with new e2e coverage.
- Edge cases checked:
- static budget overflow fallback
- single-turn oversize preservation
- transcript rewind recovery
- output ref path containment + missing ref handling
- What you did **not** verify:
- full cross-provider cost benchmark report (follow-up item)
## Compatibility / Migration
- Backward compatible? (`Yes/No`): Yes
- Config/env changes? (`Yes/No`): No (new tool params are optional)
- Migration needed? (`Yes/No`): No
- If yes, exact upgrade steps: N/A
## Failure Recovery (if this breaks)
- How to disable/revert this change quickly:
- Revert commits `312f3401e` and `f43c5f41f` for tool-output ref pathing.
- Revert commits `827d3e48c`, `b9490e95e`, `9642a3a6a`, `e22c86b41` for summary sidecar path.
- Revert commit `84e694680` for context planner.
- Files/config to restore:
- `src/agents/pi-embedded-runner/run/attempt.ts`
- `src/agents/pi-embedded-runner/context-planner.ts`
- `src/agents/pi-embedded-runner/session-summary.ts`
- `src/agents/session-tool-result-guard.ts`
- `src/agents/tools/sessions-history-tool.ts`
- Known bad symptoms reviewers should watch for:
- repeated `[SESSION_SUMMARY]` recursion in prompts
- missing/invalid `toolResult.details.outputRef` for large outputs
- context overflow regressions in long sessions
## Risks and Mitigations
- Risk: token estimator fallback (`chars/3.6`) can under/over-estimate for some content mixes.
- Mitigation: planner uses safety ratio and reserve buffers; e2e coverage verifies hard fallback behavior.
- Risk: additional file I/O for output payload retrieval.
- Mitigation: on-demand path only, hard character cap, and optional usage.
### Validation Commands
- `pnpm tsgo`
- `pnpm test src/agents/pi-embedded-runner/context-planner.test.ts src/agents/pi-embedded-runner/session-summary.test.ts`
- `pnpm exec vitest run -c vitest.e2e.config.ts src/agents/pi-embedded-runner.limithistoryturns.e2e.test.ts src/agents/pi-embedded-runner/run/attempt.e2e.test.ts src/agents/memory-search.e2e.test.ts src/agents/tools/memory-tool.citations.e2e.test.ts src/agents/session-tool-result-guard.e2e.test.ts src/agents/session-tool-result-guard.tool-result-persist-hook.e2e.test.ts`
- `pnpm exec vitest run -c vitest.e2e.config.ts src/agents/openclaw-tools.sessions.e2e.test.ts -t sessions_history`
- `pnpm exec vitest run -c vitest.e2e.config.ts src/agents/tool-display.e2e.test.ts`
### Commit Set
- `84e694680` Runner: add token-budget context planning
- `520998e5a` Memory: cap injected recall snippets
- `e22c86b41` Runner: persist incremental session summary state
- `9642a3a6a` Runner: prevent session-summary prompt feedback loops
- `b9490e95e` Runner: recover session summary after transcript rewinds
- `827d3e48c` Runner: inject session summary only under context pressure
- `f43c5f41f` Session: persist oversized tool payloads as file refs
- `812107d5f` Docs: add memory kernel design and verification report
- `312f3401e` Sessions: add on-demand outputRef payload retrieval
- `d5db9792c` Docs: update memory kernel report for outputRef retrieval
AI-assisted: Yes (Codex); verified locally with targeted UAT and e2e suites.
Agent-Signoff: Creash-the-Lobster
<!-- greptile_comment -->
<h3>Greptile Summary</h3>
This PR replaces turn-count-based context slicing with a deterministic token-budget context planner for the PI embedded runner, adds an incremental session-summary sidecar with loop/rewind hardening, externalizes oversized tool outputs to `tool-output/*.json` files with transcript `outputRef` pointers, and introduces on-demand payload retrieval via `sessions_history`.
- **Token-budget context planner** (`context-planner.ts`): Well-structured module that estimates token costs per message and trims oldest messages to fit within a computed history budget, always preserving the latest user turn. Integrates cleanly with the existing `limitHistoryTurns` and `sanitizeToolUseResultPairing` pipeline.
- **Session summary sidecar** (`session-summary.ts`): Persists incremental summary state alongside session files, with rewind detection and `[SESSION_SUMMARY]` loop prevention. Summary injection is gated on context pressure or trimming, avoiding unnecessary prompt bloat. Minor dedup bug identified (consecutive duplicate detection only checks against prior state, not accumulated additions).
- **Output ref externalization** (`session-tool-result-guard.ts`): Large tool results (>120K chars text or >24K chars details) are written to separate JSON files with SHA-256 hashes. Uses synchronous file I/O within the `appendMessage` hot path due to interface constraints — a deviation from the codebase's async I/O patterns.
- **On-demand retrieval** (`sessions-history-tool.ts`): New `outputRefPath`/`outputRefMaxChars` parameters allow fetching externalized payloads. Path containment is enforced via `path.resolve` prefix check (lexical only — does not resolve symlinks). SHA-256 verification is informational; mismatched hashes don't prevent payload delivery.
- **Memory search cap** (`memory-search.ts`, `memory-tool.ts`): New `maxInjectedChars` config (default 4000) caps total snippet characters injected per `memory_search` call, applied uniformly across backends.
- Config, schema, and display updates are consistent and minimal. Test coverage is solid with both unit and e2e tests for all new features.
<h3>Confidence Score: 4/5</h3>
- This PR is safe to merge with minor issues flagged; no critical bugs or security vulnerabilities found.
- The implementation is well-structured with thorough test coverage (unit + e2e), defensive error handling, and clear separation of concerns. The issues found are: (1) a minor dedup logic bug in session summary that only affects unlikely consecutive identical messages, (2) a style concern about sync I/O on the hot path with documented constraints, (3) defense-in-depth suggestions for path containment and hash verification. None are blocking.
- `src/agents/tools/sessions-history-tool.ts` (path containment and hash verifi...
Most Similar PRs
#10915: fix: prevent session bloat from oversized tool results and improve ...
by DukeDeSouth · 2026-02-07
78.2%
#16261: feat(agents): add two-tier tool output truncation and excludeFromCo...
by ProgramCaiCai · 2026-02-14
77.0%
#9415: Artifact-first memory: externalize tool outputs + deterministic recall
by jroth1111 · 2026-02-05
75.5%
#9012: fix(memory): resilient flush for large sessions [AI-assisted]
by cheenu1092-oss · 2026-02-04
75.2%
#21242: fix(memory): add token budget limits for memory tools (#21187)
by Asm3r96 · 2026-02-19
75.1%
#5343: fix(memoryFlush): correct context token accounting for flush gating
by jarvis-medmatic · 2026-01-31
74.9%
#11999: fix: add session-growth guard to prevent unbounded session store gr...
by reverendrewind · 2026-02-08
74.3%
#22387: fix: session_status context tracking undercount for cached providers
by 1ucian · 2026-02-21
74.3%
#11825: fix: keep tool_use/tool_result pairs together during session compac...
by C31gordon · 2026-02-08
74.2%
#14879: fix: persist session metadata to sessions.json after context pruning
by skylarkoo7 · 2026-02-12
74.2%