#18185: feat(slack): add configurable rateLimitPolicy for Slack API calls
docs
channel: slack
gateway
size: M
Cluster:
Signal and Discord Fixes
Slack users.list (Tier 2, ~20 req/min) rate limiting can cause cascading failures: 429 retries starve the WebSocket connection, killing pong replies and eventually DNS resolution. When all allowlist entries are user IDs or emails, we can resolve them with users.info (Tier 4, ~100+ req/min) or users.lookupByEmail (Tier 3) in a single call each.
Add `rateLimitPolicy?: "retry" | "fail-fast"` to SlackAccountConfig:
- "retry" (default): original behavior — always paginate users.list, retry 429s in directory operations. Zero behavioral change.
- "fail-fast": use targeted APIs (users.info, users.lookupByEmail) for ID/email entries, reject rate-limited calls immediately in bulk directory operations. Only falls back to users.list for name entries.
Threaded through all callers: monitor/provider, commands-allowlist, and onboarding wizard.
## Summary
- **Problem:** `resolveSlackUserAllowlist()` always paginates ALL workspace users via `users.list` (Tier 2, ~20 req/min) even when every entry is a user ID resolvable with a single `users.info` call (Tier 4,~100+ req/min).
- **Why it matters:** On large workspaces (i.e enteprise workspace), 429 retry loops starve the WebSocket connection: pong replies time out, then DNS resolution fails, cascading into full disconnection.
- **What changed:** Added `rateLimitPolicy?: "retry" | "fail-fast"` config option. When `"fail-fast"`: IDs resolve via `users.info`, emails via `users.lookupByEmail`, directory ops use `rejectRateLimitedCalls:true`. Names still fall back to `users.list`. Threaded the config through all 5 callers.
- **What did NOT change (scope boundary):** Default behavior is identical: no config means `"retry"`, which is the existing `users.list` + retry path. No changes to Slack event handling, message processing, or WebSocket management.
Issue Example:
```
15:56:25 [slack] [default] starting provider
09:56:25 [WARN] bolt-app http request failed getaddrinfo ENOTFOUND slack.com
09:56:25 [WARN] bolt-app http request failed getaddrinfo ENOTFOUND slack.com
09:56:26 [WARN] bolt-app http request failed getaddrinfo ENOTFOUND slack.com
09:56:26 [WARN] bolt-app http request failed getaddrinfo ENOTFOUND slack.com
09:56:26 [WARN] web-api:WebClient:0 http request failed getaddrinfo ENOTFOUND slack.com
09:56:27 [WARN] web-api:WebClient:0 http request failed getaddrinfo ENOTFOUND slack.com
09:56:27 [WARN] bolt-app http request failed getaddrinfo ENOTFOUND slack.com
09:56:27 [openclaw] Unhandled promise rejection: Error: A request error occurred: getaddrinfo ENOTFOUND slack.com
at requestErrorWithOriginal (/Users/mterns/Documents/AI/openclaw/node_modules/.pnpm/@slack+web-api@7.14.0/node_modules/@slack/web-api/src/errors.ts:82:5)
at WebClient.<anonymous> (/Users/mterns/Documents/AI/openclaw/node_modules/.pnpm/@slack+web-api@7.14.0/node_modules/@slack/web-api/src/WebClient.ts:767:43)
at Generator.throw (<anonymous>)
at rejected (/Users/mterns/Documents/AI/openclaw/node_modules/.pnpm/@slack+web-api@7.14.0/node_modules/@slack/web-api/dist/WebClient.js:39:65)
at processTicksAndRejections (node:internal/process/task_queues:105:5)
ELIFECYCLE Command failed with exit code 1.
```
## Change Type (select all)
- [ ] Bug fix
- [ x] Feature
- [ ] Refactor
- [ ] Docs
- [ ] Security hardening
- [ ] Chore/infra
## Scope (select all touched areas)
- [ ] Gateway / orchestration
- [ ] Skills / tool execution
- [ ] Auth / tokens
- [ ] Memory / storage
- [ x] Integrations
- [x ] API / contracts
- [ ] UI / DX
- [ ] CI/CD / infra
## Linked Issue/PR
- Related: rate-limit cascading failure observed after rebasing with main
## User-visible / Behavior Changes
- New optional config field: `channels.slack.rateLimitPolicy` (`"retry"` | `"fail-fast"`)
- Default (`"retry"` / unset): no change whatsoever
- `"fail-fast"`: allowlist resolution uses targeted APIs instead of full workspace pagination; directory lookups reject 429s immediately instead of retrying
## Security Impact (required)
- New permissions/capabilities? `No`
- Secrets/tokens handling changed? `No`
- New/changed network calls? `Yes` — when `"fail-fast"`, calls `users.info` and `users.lookupByEmail` instead of `users.list`. Same token, same Slack API, narrower scope (single user vs. all users).
- Command/tool execution surface changed? `No`
- Data access scope changed? `No` — targeted APIs return the same user fields as `users.list`, just for one user at a time.
## Repro + Verification
### Environment
- OS: macOS Darwin 25.1.0
- Runtime: Node 22 / pnpm
- Integration: Slack (socket mode)
- Relevant config: `{ "channels": { "slack": { "rateLimitPolicy": "fail-fast" } } }`
### Steps
1. Set `rateLimitPolicy: "fail-fast"` in Slack config with ID-only allowlist entries
2. Start the bot — observe allowlist resolution uses `users.info` per ID (no `users.list` pagination)
3. Remove `rateLimitPolicy` — observe original `users.list` behavior unchanged
### Expected
- `"fail-fast"`: resolves IDs via `users.info`, no `users.list` calls when all entries are IDs
- Default: identical to current main branch behavior
### Actual
- Build passes, tests pass (no new failures)
## Evidence
- [x] `pnpm build` — clean
- [x] `pnpm check` — pre-commit hooks: 0 warnings, 0 errors on all changed files
- [x] `pnpm test` — no new failures (13 failures are pre-existing memory tests on main)
## Human Verification (required)
- **Verified scenarios:** Build, lint, format, full test suite
- **Edge cases checked:** Mixed allowlist (IDs + names) falls back to `users.list` only for the name entries; empty/unset `rateLimitPolicy` preserves exact original code path
## Compatibility / Migration
- Backward compatible? `Yes` — default is `"retry"`, identical to current behavior
- Config/env changes? `Yes` — new optional field `rateLimitPolicy` in `SlackAccountConfig`
- Migration needed? `No` — existing configs work without changes
## Failure Recovery (if this breaks)
- How to disable/revert: remove `rateLimitPolicy` from config (or set to `"retry"`) — reverts to original code path immediately, no restart needed beyond config reload
- Known bad symptoms: if `"fail-fast"` is set and the bot needs name-based resolution on a rate-limited workspace, those name lookups still hit `users.list` and will fail fast instead of retrying
## Risks and Mitigations
- **Risk:** `"fail-fast"` mode rejects 429s in directory operations, so large-workspace directory lookups that previously would have eventually succeeded will now fail.
- **Mitigation:** Opt-in only — users must explicitly set `rateLimitPolicy: "fail-fast"`. Default behavior is unchanged. Removing the config key instantly reverts.
<!-- greptile_comment -->
<h3>Greptile Summary</h3>
Adds optional `rateLimitPolicy` config field to prevent Slack rate-limit cascading failures. When set to `"fail-fast"`, resolves allowlist entries using targeted APIs (`users.info`, `users.lookupByEmail`) instead of paginating `users.list`, and rejects 429s immediately in bulk directory operations.
**Key changes:**
- New config field `channels.slack.rateLimitPolicy` (`"retry"` | `"fail-fast"`, default `"retry"`)
- `createSlackWebClientBulk` helper creates clients with `rejectRateLimitedCalls: true`
- `resolveViaTargetedApis` resolves IDs/emails via Tier 3/4 APIs, only falls back to `users.list` for name entries
- Threaded through allowlist resolution, directory lookups, and onboarding
**Critical issue:** `resolve-users.ts:274` doesn't use `createSlackWebClientBulk` for fail-fast mode, so targeted API calls and the `listSlackUsers` fallback will still retry on 429s instead of failing immediately. This undermines the entire purpose of the fail-fast policy.
<h3>Confidence Score: 2/5</h3>
- This PR contains a critical logic bug that breaks the core "fail-fast" functionality
- The implementation has a significant bug in `resolve-users.ts` where the fail-fast policy doesn't actually use a client configured to reject rate-limited calls. This means the feature won't work as intended when `rateLimitPolicy: "fail-fast"` is set - it will still retry on 429s instead of failing fast. The fix is straightforward (use `createSlackWebClientBulk` when fail-fast is enabled), but the bug must be addressed before merging.
- `src/slack/resolve-users.ts` requires immediate fix - the fail-fast client implementation is broken
<sub>Last reviewed commit: 59add71</sub>
<!-- greptile_other_comments_section -->
<sub>(5/5) You can turn off certain types of comments like style [here](https://app.greptile.com/review/github)!</sub>
<!-- /greptile_comment -->
Most Similar PRs
#5312: fix(slack): skip users.list API call when all user entries are IDs
by gunpyo-park-musinsa · 2026-01-31
73.5%
#17579: fix(slack): prevent Zod default groupPolicy from breaking multi-acc...
by ZetiMente · 2026-02-15
71.7%
#23056: fix(slack): exclude archived channels from startup channel resolution
by dbachelder · 2026-02-21
71.5%
#20479: fix(slack): keep replies flowing for oversized file uploads
by olyashok · 2026-02-19
71.0%
#7592: fix(slack): prevent unbounded memory growth in channel type cache
by namratabhaumik · 2026-02-03
69.8%
#17879: fix: prevent Slack auth errors from crashing the entire gateway
by zuyan9 · 2026-02-16
69.3%
#13686: Add opt-in rate limiting and token-based budgets for external API c...
by ShresthSamyak · 2026-02-10
69.0%
#23320: fix(slack): respect replyToMode when incomingThreadTs is auto-created
by dorukardahan · 2026-02-22
68.9%
#8024: fix(slack): resolve channel names via directory for cross-account m...
by emma-digital-assistant · 2026-02-03
68.7%
#19083: Slack: preserve per-thread context and consistent thread replies
by jkimbo · 2026-02-17
68.5%