← Back to PRs

#16865: fix(diagnostics-otel): share listeners/transports across module bundles

by leonnardo open 2026-02-15 06:08 View on GitHub →
size: S
## Summary - Problem: `diagnostics-otel` could miss diagnostic events and/or logs when OpenClaw had multiple module instances loaded (bundle/module isolation), because state was module-local. - Why it matters: OTEL export appeared enabled but logs/metrics could silently stop flowing in real gateway runtime. - What changed: - Shared diagnostic event state via `globalThis` in `src/infra/diagnostic-events.ts` - Shared external log transport registry via `globalThis` in `src/logging/logger.ts` - `registerLogTransport(...)` now attaches to all active logger instances with idempotent attach tracking - Added regression coverage in `src/logger.test.ts` for multi-logger transport attach + unsubscribe - Scope boundary: no config schema changes, no API/protocol changes. ## Change Type (select all) - [x] Bug fix - [ ] Feature - [x] Refactor - [ ] Docs - [ ] Security hardening - [ ] Chore/infra ## Scope (select all touched areas) - [x] Gateway / orchestration - [ ] Skills / tool execution - [ ] Auth / tokens - [ ] Memory / storage - [x] Integrations - [ ] API / contracts - [ ] UI / DX - [ ] CI/CD / infra ## Linked Issue/PR - Related #5190 - Related #11168 - Related #12475 ## User-visible / Behavior Changes - OTEL diagnostics integration is reliable across module/bundle boundaries. - Logs and diagnostics events now consistently reach OTEL exporters when plugin is enabled. ## Security Impact (required) - New permissions/capabilities? (`No`) - Secrets/tokens handling changed? (`No`) - New/changed network calls? (`No`) - Command/tool execution surface changed? (`No`) - Data access scope changed? (`No`) ## Repro + Verification ### Environment - OS: Linux - Runtime/container: Node 24 (local dev gateway) - Model/provider: OpenAI Codex (`gpt-5.3-codex`) for traffic generation - Integration/channel: diagnostics-otel extension + OTLP/HTTP collector - Relevant config (redacted): `diagnostics-otel` with logs/metrics enabled, OTLP endpoint configured ### Steps 1. Enable `diagnostics-otel` with OTLP logs + metrics. 2. Start gateway and generate normal message traffic. 3. Verify collector counters and backend queries. ### Expected - Diagnostic events and logs are exported consistently even with module/bundle split. ### Actual - Collector accepted/sent counters for logs and metrics increase. - Loki query for `service_name="openclaw-gateway-dev"` returns gateway logs. ## Evidence - [x] Failing test/log before + passing after - [x] Trace/log snippets - [ ] Screenshot/recording - [ ] Perf numbers (if relevant) ## Human Verification (required) What I personally verified: - Build/check pipeline passes locally: - `pnpm build` - `pnpm check` - Targeted test suite for touched areas passes: - `pnpm vitest run src/logger.test.ts src/infra/infra-store.test.ts src/logging/diagnostic.test.ts extensions/diagnostics-otel/src/service.test.ts` - Runtime OTEL validation: - collector log/metric counters increased - Loki query returned `openclaw-gateway-dev` logs Edge cases checked: - multi-logger transport attach - transport unsubscribe behavior - defensive transport error handling (no throw propagation) What I did **not** verify: - full channel/platform matrix end-to-end ## Compatibility / Migration - Backward compatible? (`Yes`) - Config/env changes? (`No`) - Migration needed? (`No`) ## Failure Recovery (if this breaks) - Revert this PR. - Symptoms to watch for: OTEL diagnostics exporter enabled but no log/diagnostic traffic reaching collector. ## Risks and Mitigations - Risk: duplicate transport attachment or cross-instance confusion. - Mitigation: global registry + per-logger idempotent attach tracking (`WeakMap`). --- AI-assisted: Yes Testing level: fully tested for touched scope (targeted tests + runtime validation) <!-- greptile_comment --> <h3>Greptile Summary</h3> Moves diagnostic events and log transport state to `globalThis` to fix module/bundle isolation issues preventing OTEL export. **Key changes:** - Diagnostic event listeners and sequence counter now shared via `globalThis.__openclaw_diagnostic_events_state__` - Log transport registry moved to `globalThis.__openclaw_external_log_transports__` with per-logger idempotent attachment tracking via `WeakMap` - `registerLogTransport` now attaches to all active logger instances, not just the cached one - Added timestamp validation and defensive error handling in OTEL log export - New test coverage for multi-logger transport attachment and unsubscribe behavior The implementation correctly addresses the stated problem (diagnostics-otel missing events across module boundaries) by sharing state globally while maintaining idempotency through `WeakMap` tracking. <h3>Confidence Score: 4/5</h3> - Safe to merge with minor consideration for edge cases - The implementation correctly solves the module isolation problem using established patterns (`globalThis` with typed keys). The `WeakMap` approach for idempotent attachment is sound. Test coverage includes the critical multi-logger scenario. One minor consideration: timestamp validation silently drops invalid timestamps rather than logging warnings, but the try-catch ensures no exceptions propagate. - No files require special attention <sub>Last reviewed commit: 160e710</sub> <!-- greptile_other_comments_section --> <!-- /greptile_comment -->

Most Similar PRs