#11304: feishu: cache bot info to reduce probe API calls (feishu set low quota)

by jasonthewhale open 2026-02-07 17:38 View on GitHub →

channel: feishu stale size: S

Cluster: Feishu Integration Enhancements

Fixes #10549 Fixes #10041 ## Summary The gateway's periodic health refresh (`HEALTH_REFRESH_INTERVAL_MS = 60000`) calls `probeFeishu()` every 60 seconds, which hits `GET /open-apis/bot/v3/info` each time. On Feishu's free tier the monthly "basic API call" quota is 10,000 — the probe alone burns through ~43,200 calls/month (1,440/day), exhausting the quota in under a week and causing Error 99991403. ### Why Feishu needs this and other channels don't All channels (Discord, Telegram, Slack, etc.) share the same 60-second probe pattern, but only Feishu enforces a hard **monthly API call cap**. Discord, Telegram, and Slack use per-second rate limits with no monthly ceiling, so their probes are effectively free. Feishu's free tier counts every server API call toward a shared 10,000/month budget. ## What this PR does After the first successful `bot/v3/info` call per account, cache the bot metadata (`botName`, `botOpenId`) in memory. Subsequent probes validate connectivity via the Lark SDK's internal `TokenManager.getTenantAccessToken()`, which: - Returns from in-memory cache most of the time (~2 h TTL) - On cache miss, refreshes via `POST /auth/v3/tenant_access_token/internal` — an auth infrastructure endpoint that does not count toward the basic API call quota | | Before | After | |---|---|---| | `bot/v3/info` calls/day | ~1,440 | 1 (startup) | | Token refreshes/day | ~12 (SDK internal) | ~12 (unchanged, quota-exempt) | | Monthly quota usage | ~43,200 | ~1 | No change to bot metadata freshness — bot name and open_id are stable per app lifecycle and re-fetched on every gateway restart. ## Test plan - [x] `pnpm build` passes - [x] `pnpm lint` passes (0 warnings, 0 errors) - [x] `pnpm test` — 840 passed, 8 failed (pre-existing memory/LanceDB failures on main, unrelated) - [x] Deployed to live server, verified `openclaw channels status --probe` reports Feishu `works` - [x] Confirmed zero `bot/v3/info` calls in logs after initial startup probe (monitored 2+ minutes)  <h3>Greptile Summary</h3> Implements TTL-based caching of Feishu bot info to reduce API quota consumption from ~43,200 calls/month to ~30 calls/month. After the first successful probe, cached bot metadata is reused and connectivity is validated via the SDK's internal token manager (quota-exempt). The cache key now properly uses `appId:domain` to handle credential rotations, and token validation explicitly checks for non-empty strings before returning success. Includes LRU-style eviction with a 64-entry cap and a `clearBotInfoCache()` export for testing. <h3>Confidence Score: 4/5</h3> - This PR is safe to merge with low risk - The implementation addresses the Feishu quota exhaustion issue with a well-reasoned caching strategy. Previous review concerns about token validation and cache key design have been resolved (explicit token validation on lines 63-66, credential-based cache key on lines 35-37). The fall-through pattern properly invalidates stale cache entries. Minor risk remains from relying on internal SDK APIs (`tokenManager.getTenantAccessToken()`), but this is mitigated by explicit validation and fallback to full probe on errors. - No files require special attention <sub>Last reviewed commit: ac74ea3</sub>