#5343: fix(memoryFlush): correct context token accounting for flush gating

by jarvis-medmatic open 2026-01-31 11:08 View on GitHub →

commands agents size: L

## Summary Memory flush could be skipped when session totals were stale/unknown after stricter `totalTokensFresh` checks. This PR makes flush gating deterministic by using a fresh projected next-context token count while keeping persisted session accounting semantics clear. ## What changed ### 1) `SessionEntry.totalTokens` stays prompt/context-only and resilient - `deriveSessionTotalTokens()` now consistently returns prompt/context tokens. - Session-store writers only persist `totalTokens` when the derived value is finite and > 0. - When totals are invalid/missing, writers now clear `totalTokens` and mark `totalTokensFresh = false`. - Cron telemetry includes `usage.total_tokens` only when available. Files: - `src/agents/usage.ts` - `src/auto-reply/reply/session-usage.ts` - `src/commands/agent/session-store.ts` - `src/cron/isolated-agent/run.ts` ### 2) Memory flush gates on projected next-context tokens - Added prompt token estimation for the pending user message. - Flush gating uses: `projected = promptTokensSnapshot + lastOutputTokens + nextUserPromptEstimate` - `shouldRunMemoryFlush()` now accepts optional `tokenCount`; when present, that value is used for gating while duplicate-flush suppression still uses `compactionCount` / `memoryFlushCompactionCount`. Files: - `src/auto-reply/reply/memory-flush.ts` - `src/auto-reply/reply/agent-runner-memory.ts` - `src/auto-reply/reply/agent-runner.ts` ### 3) Transcript fallback + tail-scan performance In `agent-runner-memory.ts`: - If persisted totals are stale/unknown, transcript usage is read to recover prompt/output token snapshots for flush decisioning. - If persisted prompt totals are fresh but near threshold, transcript output tokens are still read to catch threshold flips. - Transcript reads now tail-scan JSONL in chunks (instead of loading the full file) to find the latest non-zero usage. - When transcript-derived prompt totals are reliable, they are persisted as `totalTokens` with `totalTokensFresh = true`. ### 4) Relative `sessionFile` transcript paths are normalized Relative session transcript paths are normalized through `resolveSessionFilePath(...)` + `resolveSessionFilePathOptions(...)` (including `storePath`) before reading. This prevents silent fallback failures when session-store entries contain relative `sessionFile` paths. ### 5) Test updates included in this PR - Added coverage for relative `sessionFile` transcript fallback path handling. - Updated memory-flush runReplyAgent tests around configured prompts/system prompt behavior. - Added a strictness fix for `chat.history` test mocking (`sessionKey` optional typing). Files: - `src/auto-reply/reply/agent-runner.runreplyagent.test.ts` - `src/agents/subagent-announce.format.e2e.test.ts` ## Scope clarification - The earlier subagent-timeout behavior commit was dropped from this PR. - Current scope is usage + memory-flush behavior and related tests. ## Validation ```bash pnpm check ``` ## AI Disclosure 🤖 - **AI-assisted**: Yes — developed with OpenAI Codex - **Degree of testing**: Targeted unit tests run locally - **Human oversight**: [@ManuelHettich](https://github.com/ManuelHettich) reviewed all changes Co-authored-by: Jarvis <jarvis@medmatic.ai>