← Back to PRs

#13895: fix(usage): exclude cache tokens from context-window accounting

by zerone0x open 2026-02-11 05:19 View on GitHub →
agents stale size: S experienced-contributor
## Summary Fixes #13853 `derivePromptTokens()` summed `input + cacheRead + cacheWrite` to calculate context-window usage. However, cache tokens (both `cacheRead` and `cacheWrite`) are already part of the prompt — they don't consume *additional* context window capacity. With Anthropic prompt caching, `cacheRead` frequently exceeds 100K tokens, inflating the derived total past the context cap (e.g. 200K) and triggering premature auto-compaction. ## Changes - **`src/agents/usage.ts`**: `derivePromptTokens()` now returns only `input` tokens, excluding `cacheRead` and `cacheWrite` from context-window accounting. Cache metrics remain in `NormalizedUsage` for cost reporting. - **`src/agents/usage.test.ts`**: Updated existing tests and added new test cases covering the fix, edge cases (input exceeds context window, missing input, fallback to total). ## Test Plan - [x] `derivePromptTokens` returns only `input`, ignoring cache tokens - [x] `deriveSessionTotalTokens` no longer inflates with cache tokens - [x] Context window capping still works when input exceeds the limit - [x] Fallback to `usage.total` works when input is missing - [x] All 27 related tests pass (`usage.test.ts`, `memory-flush.test.ts`, `status.test.ts`) - [x] Lint passes (0 warnings, 0 errors) - [x] Build succeeds --- 🤖 Generated with Claude Code (issue-hunter-pro) <!-- greptile_comment --> <h2>Greptile Overview</h2> <h3>Greptile Summary</h3> This PR changes context-window accounting to avoid double-counting Anthropic prompt-cache metrics. Specifically, `derivePromptTokens()` now returns only `usage.input` (excluding `cacheRead`/`cacheWrite`), and `deriveSessionTotalTokens()` therefore uses input tokens for session/context usage (with fallback to `usage.total` when input is missing, and context-window capping unchanged). Tests in `src/agents/usage.test.ts` were updated/expanded to cover the cache-inflation regression and edge cases (missing input, capping when input exceeds the window, undefined usage). Net effect: session `totalTokens` used for `/status` percent-used and related context-window displays should no longer be inflated by very large cache read counts, preventing premature auto-compaction while preserving cache metrics for cost reporting. <h3>Confidence Score: 5/5</h3> - This PR is safe to merge with minimal risk. - The change is small, well-scoped (token accounting only), and aligns with how cached tokens should be treated for context-window capacity. Reviewed call sites that persist/display session totals and found no invariant break; tests were updated and add coverage for the regression and key edge cases. - No files require special attention <!-- greptile_other_comments_section --> <!-- /greptile_comment -->

Most Similar PRs