#16865: fix(diagnostics-otel): share listeners/transports across module bundles
size: S
Cluster:
Plugin Enhancements and Fixes
## Summary
- Problem: `diagnostics-otel` could miss diagnostic events and/or logs when OpenClaw had multiple module instances loaded (bundle/module isolation), because state was module-local.
- Why it matters: OTEL export appeared enabled but logs/metrics could silently stop flowing in real gateway runtime.
- What changed:
- Shared diagnostic event state via `globalThis` in `src/infra/diagnostic-events.ts`
- Shared external log transport registry via `globalThis` in `src/logging/logger.ts`
- `registerLogTransport(...)` now attaches to all active logger instances with idempotent attach tracking
- Added regression coverage in `src/logger.test.ts` for multi-logger transport attach + unsubscribe
- Scope boundary: no config schema changes, no API/protocol changes.
## Change Type (select all)
- [x] Bug fix
- [ ] Feature
- [x] Refactor
- [ ] Docs
- [ ] Security hardening
- [ ] Chore/infra
## Scope (select all touched areas)
- [x] Gateway / orchestration
- [ ] Skills / tool execution
- [ ] Auth / tokens
- [ ] Memory / storage
- [x] Integrations
- [ ] API / contracts
- [ ] UI / DX
- [ ] CI/CD / infra
## Linked Issue/PR
- Related #5190
- Related #11168
- Related #12475
## User-visible / Behavior Changes
- OTEL diagnostics integration is reliable across module/bundle boundaries.
- Logs and diagnostics events now consistently reach OTEL exporters when plugin is enabled.
## Security Impact (required)
- New permissions/capabilities? (`No`)
- Secrets/tokens handling changed? (`No`)
- New/changed network calls? (`No`)
- Command/tool execution surface changed? (`No`)
- Data access scope changed? (`No`)
## Repro + Verification
### Environment
- OS: Linux
- Runtime/container: Node 24 (local dev gateway)
- Model/provider: OpenAI Codex (`gpt-5.3-codex`) for traffic generation
- Integration/channel: diagnostics-otel extension + OTLP/HTTP collector
- Relevant config (redacted): `diagnostics-otel` with logs/metrics enabled, OTLP endpoint configured
### Steps
1. Enable `diagnostics-otel` with OTLP logs + metrics.
2. Start gateway and generate normal message traffic.
3. Verify collector counters and backend queries.
### Expected
- Diagnostic events and logs are exported consistently even with module/bundle split.
### Actual
- Collector accepted/sent counters for logs and metrics increase.
- Loki query for `service_name="openclaw-gateway-dev"` returns gateway logs.
## Evidence
- [x] Failing test/log before + passing after
- [x] Trace/log snippets
- [ ] Screenshot/recording
- [ ] Perf numbers (if relevant)
## Human Verification (required)
What I personally verified:
- Build/check pipeline passes locally:
- `pnpm build`
- `pnpm check`
- Targeted test suite for touched areas passes:
- `pnpm vitest run src/logger.test.ts src/infra/infra-store.test.ts src/logging/diagnostic.test.ts extensions/diagnostics-otel/src/service.test.ts`
- Runtime OTEL validation:
- collector log/metric counters increased
- Loki query returned `openclaw-gateway-dev` logs
Edge cases checked:
- multi-logger transport attach
- transport unsubscribe behavior
- defensive transport error handling (no throw propagation)
What I did **not** verify:
- full channel/platform matrix end-to-end
## Compatibility / Migration
- Backward compatible? (`Yes`)
- Config/env changes? (`No`)
- Migration needed? (`No`)
## Failure Recovery (if this breaks)
- Revert this PR.
- Symptoms to watch for: OTEL diagnostics exporter enabled but no log/diagnostic traffic reaching collector.
## Risks and Mitigations
- Risk: duplicate transport attachment or cross-instance confusion.
- Mitigation: global registry + per-logger idempotent attach tracking (`WeakMap`).
---
AI-assisted: Yes
Testing level: fully tested for touched scope (targeted tests + runtime validation)
<!-- greptile_comment -->
<h3>Greptile Summary</h3>
Moves diagnostic events and log transport state to `globalThis` to fix module/bundle isolation issues preventing OTEL export.
**Key changes:**
- Diagnostic event listeners and sequence counter now shared via `globalThis.__openclaw_diagnostic_events_state__`
- Log transport registry moved to `globalThis.__openclaw_external_log_transports__` with per-logger idempotent attachment tracking via `WeakMap`
- `registerLogTransport` now attaches to all active logger instances, not just the cached one
- Added timestamp validation and defensive error handling in OTEL log export
- New test coverage for multi-logger transport attachment and unsubscribe behavior
The implementation correctly addresses the stated problem (diagnostics-otel missing events across module boundaries) by sharing state globally while maintaining idempotency through `WeakMap` tracking.
<h3>Confidence Score: 4/5</h3>
- Safe to merge with minor consideration for edge cases
- The implementation correctly solves the module isolation problem using established patterns (`globalThis` with typed keys). The `WeakMap` approach for idempotent attachment is sound. Test coverage includes the critical multi-logger scenario. One minor consideration: timestamp validation silently drops invalid timestamps rather than logging warnings, but the try-catch ensures no exceptions propagate.
- No files require special attention
<sub>Last reviewed commit: 160e710</sub>
<!-- greptile_other_comments_section -->
<!-- /greptile_comment -->
Most Similar PRs
#22478: fix(diagnostics-otel): wire OTLP exporter to emit traffic to config...
by LuffySama-Dev · 2026-02-21
85.8%
#19353: fix(diagnostics-otel): fix cross-chunk module isolation breaking even…
by nez · 2026-02-17
82.0%
#21290: feat(diagnostics-otel): OpenTelemetry diagnostics with GenAI semant...
by Baukebrenninkmeijer · 2026-02-19
81.0%
#11530: diagnostics-otel: fix OpenTelemetry v2 resource/logs API compatibility
by erain · 2026-02-07
78.7%
#19251: CLI: emit diagnostics for embedded Slack-context runs
by gg2uah · 2026-02-17
77.5%
#4255: fix(diagnostics-otel): complete OpenTelemetry v2.x compatibility
by arbgjr · 2026-01-29
77.2%
#12475: fix(logging): use Symbol.for for externalTransports to survive jiti...
by Yida-Dev · 2026-02-09
76.8%
#18901: feat(diagnostics-otel): add trace context propagation and GenAI sem...
by sergical · 2026-02-17
75.2%
#18182: fix(security): redact sensitive data in OTEL log exports (CWE-532)
by brandonwise · 2026-02-16
73.2%
#13957: Enhanced OpenClaw Observability with OTEL Integration
by trevorgordon981 · 2026-02-11
72.8%