#13895: fix(usage): exclude cache tokens from context-window accounting
agents
stale
size: S
experienced-contributor
Cluster:
Memory Management Enhancements
## Summary
Fixes #13853
`derivePromptTokens()` summed `input + cacheRead + cacheWrite` to calculate context-window usage. However, cache tokens (both `cacheRead` and `cacheWrite`) are already part of the prompt — they don't consume *additional* context window capacity. With Anthropic prompt caching, `cacheRead` frequently exceeds 100K tokens, inflating the derived total past the context cap (e.g. 200K) and triggering premature auto-compaction.
## Changes
- **`src/agents/usage.ts`**: `derivePromptTokens()` now returns only `input` tokens, excluding `cacheRead` and `cacheWrite` from context-window accounting. Cache metrics remain in `NormalizedUsage` for cost reporting.
- **`src/agents/usage.test.ts`**: Updated existing tests and added new test cases covering the fix, edge cases (input exceeds context window, missing input, fallback to total).
## Test Plan
- [x] `derivePromptTokens` returns only `input`, ignoring cache tokens
- [x] `deriveSessionTotalTokens` no longer inflates with cache tokens
- [x] Context window capping still works when input exceeds the limit
- [x] Fallback to `usage.total` works when input is missing
- [x] All 27 related tests pass (`usage.test.ts`, `memory-flush.test.ts`, `status.test.ts`)
- [x] Lint passes (0 warnings, 0 errors)
- [x] Build succeeds
---
🤖 Generated with Claude Code (issue-hunter-pro)
<!-- greptile_comment -->
<h2>Greptile Overview</h2>
<h3>Greptile Summary</h3>
This PR changes context-window accounting to avoid double-counting Anthropic prompt-cache metrics. Specifically, `derivePromptTokens()` now returns only `usage.input` (excluding `cacheRead`/`cacheWrite`), and `deriveSessionTotalTokens()` therefore uses input tokens for session/context usage (with fallback to `usage.total` when input is missing, and context-window capping unchanged). Tests in `src/agents/usage.test.ts` were updated/expanded to cover the cache-inflation regression and edge cases (missing input, capping when input exceeds the window, undefined usage).
Net effect: session `totalTokens` used for `/status` percent-used and related context-window displays should no longer be inflated by very large cache read counts, preventing premature auto-compaction while preserving cache metrics for cost reporting.
<h3>Confidence Score: 5/5</h3>
- This PR is safe to merge with minimal risk.
- The change is small, well-scoped (token accounting only), and aligns with how cached tokens should be treated for context-window capacity. Reviewed call sites that persist/display session totals and found no invariant break; tests were updated and add coverage for the regression and key edge cases.
- No files require special attention
<!-- greptile_other_comments_section -->
<!-- /greptile_comment -->
Most Similar PRs
#22387: fix: session_status context tracking undercount for cached providers
by 1ucian · 2026-02-21
84.9%
#13841: fix: use last attempt prompt tokens for session total instead of ac...
by 1kuna · 2026-02-11
83.4%
#14483: fix(cli-runner): map Anthropic cache_creation_input_tokens
by AlexAnys · 2026-02-12
80.3%
#17253: fix: propagate lastTurnTotal through usage accumulator for accurate...
by robbyczgw-cla · 2026-02-15
79.0%
#15126: fix(status): avoid false 100% context usage when totals mirror context
by AlexAnys · 2026-02-13
78.7%
#15726: fix(sessions): use model contextWindow instead of agent contextToke...
by lailoo · 2026-02-13
78.4%
#4999: fix(memory-flush): use contextTokens instead of totalTokens for thr...
by Farfadium · 2026-01-30
78.0%
#19412: fix(status): prefer configured contextTokens over session entry
by rafaelipuente · 2026-02-17
77.2%
#5343: fix(memoryFlush): correct context token accounting for flush gating
by jarvis-medmatic · 2026-01-31
77.0%
#11109: fix(tui): prefer config contextTokens over persisted session value
by marezgui · 2026-02-07
76.1%