← Back to PRs

#18901: feat(diagnostics-otel): add trace context propagation and GenAI semantic conventions

by sergical open 2026-02-17 05:26 View on GitHub →
extensions: diagnostics-otel size: M
# feat(diagnostics-otel): Add trace context propagation and GenAI semantic conventions ## Summary This PR adds two related improvements to the diagnostics-otel plugin: 1. **Trace context propagation** — Diagnostic events now carry `traceId` and `parentSpanId` fields, enabling the OTel plugin to create proper parent-child span relationships instead of disconnected root spans. 2. **GenAI semantic convention attributes** — Model usage spans now include standardized `gen_ai.*` attributes alongside existing `openclaw.*` attributes, following the [OTel GenAI semantic conventions](https://opentelemetry.io/docs/specs/semconv/gen-ai/). ## Trace Hierarchy Before (all root spans, unlinked): ``` openclaw.message.processed (standalone) openclaw.model.usage (standalone) openclaw.model.usage (standalone) ``` After (linked parent-child traces): ``` openclaw.message.processed ← parent span (traceId + spanId) └── chat claude-opus-4-6 ← child span (same traceId, parentSpanId = parent's spanId) └── chat claude-opus-4-6 ← child span (same traceId, parentSpanId = parent's spanId) ``` The trace context lifecycle: 1. `logWebhookReceived` generates a 32-hex-char `traceId` (UUID without dashes) 2. `logMessageQueued` stores the `traceId` on the session state when provided 3. `logMessageProcessed` reads the session's `traceId`, generates a 16-hex-char `spanId`, and emits both on the event 4. Model usage events inherit `traceId` and `parentSpanId` from the session state 5. The OTel plugin uses `trace.setSpanContext()` to create child spans under the proper parent ## GenAI Convention Attributes Added On `model.usage` spans (alongside existing `openclaw.*` attributes): | Attribute | Value | Source | |-----------|-------|--------| | `gen_ai.operation.name` | `"chat"` | Static | | `gen_ai.system` | Provider name | `evt.provider` | | `gen_ai.request.model` | Model name | `evt.model` | | `gen_ai.usage.input_tokens` | Input token count | `evt.usage.input` | | `gen_ai.usage.output_tokens` | Output token count | `evt.usage.output` | Span names updated for GenAI conventions: - Model usage: `chat ${model}` (e.g., `chat claude-opus-4-6`) - Message processed: unchanged (`openclaw.message.processed`) ## Files Changed | File | Change | |------|--------| | `src/infra/diagnostic-events.ts` | Added optional `traceId` and `parentSpanId` to `DiagnosticBaseEvent` | | `src/logging/diagnostic-session-state.ts` | Added `traceId` and `currentSpanId` to `SessionState` | | `src/logging/diagnostic.ts` | Generate and propagate trace context in webhook/message lifecycle | | `extensions/diagnostics-otel/src/service.ts` | Accept parent context in span creation, add GenAI attributes, update span names | | `src/infra/diagnostic-events.test.ts` | New: verify trace context fields pass through events | | `extensions/diagnostics-otel/src/service.test.ts` | Added tests for GenAI attributes and trace context linking | ## Backwards Compatibility - **Fully backwards compatible**: All trace context fields are optional (`traceId?: string`, `parentSpanId?: string`) - **No attributes removed**: `openclaw.*` attributes remain on all spans — `gen_ai.*` attributes are added alongside - **No breaking changes to event types**: `DiagnosticEventInput` omits `ts` and `seq` as before; the new fields are optional in the intersection types - **Span creation fallback**: When no `traceId` is present, spans are created as root spans (existing behavior) - The `logMessageQueued` function's `traceId` parameter is optional; existing callers don't need changes ## How to Test 1. **Unit tests:** ```bash npx vitest run src/infra/diagnostic-events.test.ts npx vitest run extensions/diagnostics-otel/src/service.test.ts ``` 2. **Manual verification with a collector:** - Configure `diagnostics.otel.endpoint` to point at a local OTLP collector or Jaeger - Send a message through any channel - Verify in the trace UI that `openclaw.message.processed` and `chat <model>` spans share the same `traceId` and have a parent-child relationship - Verify `gen_ai.*` attributes appear on model usage spans 3. **Backwards compat check:** - Run the full test suite to confirm no regressions - Verify `openclaw.*` attributes still appear on all spans <!-- greptile_comment --> <h3>Greptile Summary</h3> This PR adds trace context propagation and GenAI semantic convention attributes to the diagnostics-otel plugin. It introduces `traceId`/`parentSpanId` fields on diagnostic events, stores trace context on session state, and adds `gen_ai.*` attributes alongside existing `openclaw.*` attributes on model usage spans. Span names for model usage are updated to follow GenAI conventions (`chat <model>`). - **Trace context infrastructure**: `DiagnosticBaseEvent` gains optional `traceId` and `parentSpanId`; `SessionState` gains `traceId` and `currentSpanId`. The OTel plugin's `spanWithDuration` now accepts parent context and uses `trace.setSpanContext` to establish parent-child links. - **GenAI semantic conventions**: Model usage spans include `gen_ai.operation.name`, `gen_ai.system`, `gen_ai.request.model`, `gen_ai.usage.input_tokens`, and `gen_ai.usage.output_tokens` per the OTel GenAI spec. - **Trace hierarchy concern**: `logMessageProcessed` emits its own generated `spanId` as `parentSpanId`, causing the OTel plugin to create a self-referential parent span rather than the intended root span. See inline comment for details. - **End-to-end wiring gap**: No existing callers of `logMessageQueued` pass `traceId`, and `model.usage` emitters don't pass trace context, so the propagation chain is incomplete in practice. This may be intentional as groundwork for a follow-up PR. <h3>Confidence Score: 3/5</h3> - The PR is backwards-compatible and low-risk for regressions, but the trace hierarchy logic has a bug that will produce incorrect span relationships. - The GenAI attribute additions and event type changes are clean and backwards-compatible. However, the trace context propagation has a logic issue where message.processed spans become self-referential instead of root spans, and the end-to-end wiring through callers is incomplete. These issues don't break existing functionality but mean the new trace linking feature won't produce the intended parent-child hierarchy. - `src/logging/diagnostic.ts` — the `parentSpanId` emitted on message.processed events creates a self-referential span parent instead of the intended trace hierarchy. <sub>Last reviewed commit: 52ad825</sub> <!-- greptile_other_comments_section --> <sub>(4/5) You can add custom instructions or style guidelines for the agent [here](https://app.greptile.com/review/github)!</sub> <!-- /greptile_comment -->

Most Similar PRs