#16856: feat(agents): add tool execution duration tracking with configurable transcript normalization

by EunHyeokJung open 2026-02-15 05:43 View on GitHub →

channel: discord channel: telegram gateway cli agents size: L

## What / Why **New capability:** capture tool execution duration and keep transcript duration fields consistent across success/failure paths. **Note:** durationMs as a field name exists elsewhere in main, but main does not provide an end-to-end toolResult duration tracking + persistence-time normalization pipeline. **Goal:** improve observability and reduce debugging friction from inconsistent duration placement. lobster-biscuit! ## Behavior ### Default enabled: - Persisted toolResult records and normalizes duration fields (success + failure). ### Disabled: - No duration enrichment/normalization mutations are applied. ### Duration definition - Wall-clock time for the tool execution call (excludes transcript persistence). ### Persistence-only normalization - Normalization runs only at persistence-time (transformToolResultForPersistence), not on the live in-memory message flow. ## Config - Added: agents.defaults.toolResultDurations.enabled - Default: true - Set false to opt out without code changes. ## Duration fields covered (when enabled) - durationMs - metadata.durationMs - details.durationMs - details.metadata.durationMs ## Normalization rules - Canonical source: durationMs - If canonical exists, synchronize other duration fields to the same value. - If canonical is missing, do not force-write duration fields. - If only nested duration fields exist without top-level durationMs, this PR does not promote them (avoids implicit data rewrite). - If disabled, do nothing. ## Security / Scope - No new network or permissions surface. - No sensitive payload expansion; timing telemetry only. - Scope is limited to duration telemetry and transcript consistency. ## Tests - src/agents/session-tool-result-guard.e2e.test.ts - src/agents/pi-tool-definition-adapter.e2e.test.ts - src/agents/pi-embedded-runner.splitsdktools.e2e.test.ts - src/agents/tool-result-durations.test.ts - pnpm lint: pass - pnpm check: pass - pnpm build: pass - Targeted tests: pass ## Manual verification ### Local agent run plus sandbox onboarding: - enabled=true: success/failure toolResult both show normalized duration fields in persisted transcripts. - enabled=false: duration enrichment/normalization path is skipped as expected. ### Evidence (brief) - enabled=true => durationMs present and nested fields match. - enabled=false => no duration enrichment/normalization. ## Failure recovery Set agents.defaults.toolResultDurations.enabled=false and restart gateway to disable immediately. ## Search (avoid duplicates) - Repo search used: durationMs, toolResult, transformToolResultForPersistence - Issue context considered: tool_use/tool_result mismatch and transcript consistency threads ## AI assistance disclosure This PR was AI-assisted. I reviewed and understand the changes.  <h3>Greptile Summary</h3> Added end-to-end tool execution duration tracking with configurable persistence-time transcript normalization. Enabled by default via `agents.defaults.toolResultDurations.enabled` (set to `false` to opt out). Captures wall-clock duration for tool execution calls and normalizes duration fields (`durationMs`, `metadata.durationMs`, `details.durationMs`, `details.metadata.durationMs`) at persistence time to ensure consistency across success and failure paths. Normalization treats root `durationMs` as canonical and synchronizes other duration fields when present. Does not promote nested-only durations to avoid implicit data rewrites. <h3>Confidence Score: 5/5</h3> - This PR is safe to merge with minimal risk - All previously identified issues have been resolved with proper immutability guarantees and aligned duration resolution logic. The implementation is well-tested with comprehensive coverage for both success and failure paths, edge cases, and configuration options. The feature is opt-out (enabled by default) with clear configuration, follows repository conventions, and includes targeted test additions with no unintended side effects. - No files require special attention <sub>Last reviewed commit: c363b2b</sub>