#13841: fix: use last attempt prompt tokens for session total instead of accumulated sum

by 1kuna open 2026-02-11 03:06 View on GitHub →

commands agents stale size: S

## Problem `deriveSessionTotalTokens` uses the accumulated usage across all tool calls and retry attempts in a single run. A run with 3-4 tool calls at ~65k prompt tokens each results in totalTokens of 195k+, capped to the context window (200k). This causes premature auto-compaction on every message with tool use. ## Root Cause `runEmbeddedPiAgent` merges usage across all assistant messages and retry attempts via `createUsageAccumulator`. This accumulated usage is passed to `persistSessionUsageUpdate` → `deriveSessionTotalTokens`, which computes `input + cacheRead + cacheWrite`. The sum across multiple tool calls vastly exceeds the actual prompt size of any single call. ## Fix Track the last attempt's prompt token count separately (`promptTokens`) and pass it through to `deriveSessionTotalTokens` as an override. This reflects the actual context size sent to the API rather than an inflated accumulation. Also exclude `cacheRead` from prompt token derivation so `derivePromptTokens` uses input + output only (no cacheRead/cacheWrite). Related issues: #13698, #5457, #8196, #15006  <h2>Greptile Overview</h2> <h3>Greptile Summary</h3> This PR changes how `session.totalTokens` is derived so tool-heavy runs don’t inflate totals by summing usage across multiple tool calls/retries. It introduces an optional `promptTokens` field on `EmbeddedPiAgentMeta`, computes it from the last attempt’s usage, and threads it through session-store update paths (`persistSessionUsageUpdate`, agent-runner/followup-runner, session-store, cron isolated agent). `deriveSessionTotalTokens` now accepts a `promptTokens` override and prefers it when present, still capping to the model context window. Primary risk area is the selection of usage used to compute `promptTokens` for the last attempt; if it uses per-message usage instead of attempt aggregate, it can undercount context sizing and reduce compaction frequency. <h3>Confidence Score: 4/5</h3> - Mostly safe to merge, but verify last-attempt prompt token calculation - Changes are localized and add a straightforward override flow for total token computation with test coverage. The main remaining concern is that `promptTokens` may be computed from per-message usage (`lastAssistant.usage`) rather than attempt-level usage, which can underreport context usage in tool-heavy attempts and defeat the purpose of the fix in some cases. - src/agents/pi-embedded-runner/run.ts