#13841: fix: use last attempt prompt tokens for session total instead of accumulated sum
commands
agents
stale
size: S
Cluster:
Memory Management Enhancements
## Problem
`deriveSessionTotalTokens` uses the accumulated usage across all tool calls and retry attempts in a single run. A run with 3-4 tool calls at ~65k prompt tokens each results in totalTokens of 195k+, capped to the context window (200k). This causes premature auto-compaction on every message with tool use.
## Root Cause
`runEmbeddedPiAgent` merges usage across all assistant messages and retry attempts via `createUsageAccumulator`. This accumulated usage is passed to `persistSessionUsageUpdate` → `deriveSessionTotalTokens`, which computes `input + cacheRead + cacheWrite`. The sum across multiple tool calls vastly exceeds the actual prompt size of any single call.
## Fix
Track the last attempt's prompt token count separately (`promptTokens`) and pass it through to `deriveSessionTotalTokens` as an override. This reflects the actual context size sent to the API rather than an inflated accumulation.
Also exclude `cacheRead` from prompt token derivation so `derivePromptTokens` uses input + output only (no cacheRead/cacheWrite).
Related issues: #13698, #5457, #8196, #15006
<!-- greptile_comment -->
<h2>Greptile Overview</h2>
<h3>Greptile Summary</h3>
This PR changes how `session.totalTokens` is derived so tool-heavy runs don’t inflate totals by summing usage across multiple tool calls/retries. It introduces an optional `promptTokens` field on `EmbeddedPiAgentMeta`, computes it from the last attempt’s usage, and threads it through session-store update paths (`persistSessionUsageUpdate`, agent-runner/followup-runner, session-store, cron isolated agent). `deriveSessionTotalTokens` now accepts a `promptTokens` override and prefers it when present, still capping to the model context window.
Primary risk area is the selection of usage used to compute `promptTokens` for the last attempt; if it uses per-message usage instead of attempt aggregate, it can undercount context sizing and reduce compaction frequency.
<h3>Confidence Score: 4/5</h3>
- Mostly safe to merge, but verify last-attempt prompt token calculation
- Changes are localized and add a straightforward override flow for total token computation with test coverage. The main remaining concern is that `promptTokens` may be computed from per-message usage (`lastAssistant.usage`) rather than attempt-level usage, which can underreport context usage in tool-heavy attempts and defeat the purpose of the fix in some cases.
- src/agents/pi-embedded-runner/run.ts
<!-- greptile_other_comments_section -->
<!-- /greptile_comment -->
Most Similar PRs
#17253: fix: propagate lastTurnTotal through usage accumulator for accurate...
by robbyczgw-cla · 2026-02-15
83.5%
#13895: fix(usage): exclude cache tokens from context-window accounting
by zerone0x · 2026-02-11
83.4%
#22387: fix: session_status context tracking undercount for cached providers
by 1ucian · 2026-02-21
81.6%
#8477: TUI: persist session token totals when usage metadata is missing
by LarHope · 2026-02-04
80.8%
#11999: fix: add session-growth guard to prevent unbounded session store gr...
by reverendrewind · 2026-02-08
79.2%
#8961: feat: smarter compaction tool truncation + token count in system pr...
by SocialNerd42069 · 2026-02-04
77.7%
#14879: fix: persist session metadata to sessions.json after context pruning
by skylarkoo7 · 2026-02-12
77.4%
#15173: fix(session): reset totalTokens after compaction when estimate unav...
by EnzoGaillardSystems · 2026-02-13
76.5%
#14913: fix: update context pruning to notify session metadata after prunin...
by ScreenTechnicals · 2026-02-12
75.7%
#9085: fix: improve stability for terminated responses and telegram retries
by vladdick88 · 2026-02-04
75.7%