#21290: feat(diagnostics-otel): OpenTelemetry diagnostics with GenAI semantic conventions

by Baukebrenninkmeijer open 2026-02-19 21:50 View on GitHub →

docs extensions: diagnostics-otel commands agents size: XL

## Summary Upgrades the `@openclaw/diagnostics-otel` exporter to produce structured, per-call telemetry aligned with [OpenTelemetry GenAI semantic conventions](https://opentelemetry.io/docs/specs/semconv/gen-ai/). - **Run-level parent span** (`openclaw.agent.turn`) per agent turn - **Per-inference spans** for each LLM call (initial, post-tool followups, loops) - **Tool execution spans** with `gen_ai.tool.*` attributes - **Opt-in content capture** (`diagnostics.otel.captureContent`) for messages and tool I/O - **GenAI metrics**: `gen_ai.client.operation.duration`, `gen_ai.client.time_to_first_token`, `gen_ai.client.token.usage` ## Event model Replaces the monolithic `model.usage` event with a structured lifecycle: 1. `run.started` — agent turn begins 2. `model.inference.started` — LLM call begins (captures input messages, system instructions, tool definitions) 3. `model.inference` — LLM call ends (duration, TTFT, usage, output messages) 4. `tool.execution` — tool call (duration, errors, optional I/O) 5. `run.completed` — agent turn ends (aggregate usage, cost, duration) ## Key design decisions - Input messages captured at the actual model-call boundary (not from streaming state) - Content capture gated behind `diagnostics.otel.captureContent` — when disabled, spans still include timings/usage/errors - Provider names normalized to GenAI enum (`openai`, `anthropic`, `gcp.gemini`, etc.) - `Symbol.for()` used for global diagnostic state key (better cross-module isolation) - Recursion guard (`dispatchDepth`) retained for diagnostic event dispatch safety ## Test plan - [ ] `pnpm vitest run extensions/diagnostics-otel/src/service.test.ts` - [ ] `pnpm vitest run extensions/diagnostics-otel/src/service.metrics.test.ts` - [ ] `pnpm vitest run extensions/diagnostics-otel/src/service.spans.test.ts` - [ ] `pnpm vitest run src/agents/pi-embedded-subscribe.subscribe-embedded-pi-session.emits-diagnostic-tool-execution-events.test.ts` - [ ] `pnpm vitest run src/agents/pi-embedded-subscribe.subscribe-embedded-pi-session.emits-diagnostic-sessionkey.test.ts` - [ ] `pnpm vitest run src/commands/agent.diagnostics.test.ts` - [ ] `npx tsc --noEmit` passes (only pre-existing e2e test errors) 🤖 Generated with [Claude Code](https://claude.com/claude-code)  <h3>Greptile Summary</h3> This PR upgrades the `@openclaw/diagnostics-otel` extension to align with OpenTelemetry GenAI semantic conventions, adding structured per-call telemetry with proper span hierarchies and GenAI-standard metrics. **Major changes:** - Replaces monolithic `model.usage` events with lifecycle events: `run.started`, `model.inference.started`, `model.inference`, `tool.execution`, `run.completed` - Implements 3-level span hierarchy: root `openclaw.message` span → `invoke_agent` turn spans → `chat` LLM call spans and `execute_tool` spans - Adds GenAI metrics: `gen_ai.client.operation.duration`, `gen_ai.client.time_to_first_token`, `gen_ai.client.token.usage` - Implements W3C Trace Context propagation via shared Symbol registry, injecting `traceparent`/`tracestate` headers into LLM requests - Adds granular opt-in content capture via `diagnostics.otel.captureContent` config (strict opt-in per field after developer feedback) - Normalizes provider names to GenAI enum values (`openai`, `anthropic`, `gcp.gemini`, etc.) **Architecture:** - Event handlers separated into `otel-event-handlers.ts` for maintainability - Diagnostic builders in `diagnostic-builders.ts` convert agent messages to GenAI format - Global state uses `Symbol.for()` for cross-module isolation - Recursion guard retained for diagnostic safety - TTL-based cleanup for orphaned spans/traces (10 min) **Test coverage:** - Comprehensive unit tests for spans, metrics, and content capture modes - Integration tests for diagnostic event emission - Tests verify strict opt-in behavior for content capture <h3>Confidence Score: 4/5</h3> - This PR is safe to merge with minor observations noted - The implementation is well-structured with comprehensive test coverage, proper error handling, and clear separation of concerns. The code follows OpenTelemetry best practices and the PR description accurately reflects the changes. One previous review comment about strict opt-in semantics was already addressed by the developer. The changes are substantial but well-isolated to the diagnostics extension with minimal impact on core agent logic. - No files require special attention - the implementation is solid across all changed files <sub>Last reviewed commit: 9301770</sub>