#23077: fix: block chunker breaks at arbitrary whitespace after minChars - AI assisted
agents
size: M
Cluster:
Block Streaming Enhancements
The block chunker's whitespace fallback was firing too early: as soon as the buffer exceeded `minChars` (e.g. 300), it fell back to splitting at any whitespace instead of waiting for `maxChars` (e.g. 800) to find proper structural break points. This caused mid-phrase breaks like "**Zoom | Access**" during token-by-token streaming in Teams, which broke formatting and made reading more difficult.
**Fix:** The whitespace fallback in `#pickBreakIndex()` now only fires when `buffer.length >= maxChars`, giving the preferred break type (sentence, paragraph, etc.) the full `minChars..maxChars` window to find a structural break.
## Summary
- **Problem**: Block streaming messages split mid-phrase at arbitrary whitespace past `minChars`, producing broken formatting like bold headers split across messages
- **Why it matters**: Broken formatting in Teams/Discord/Telegram makes bot responses hard to read
- **What changed**: In `#pickBreakIndex()`, the whitespace-only fallback threshold moved from `minChars` to `maxChars`. In `#pickSoftBreakIndex()`, removed the premature whitespace path. The preferred break type still gets first priority — no changes to preference semantics.
- **What did NOT change**: No API changes, no changes to the coalescer or streaming pipeline. Break preference behavior is preserved.
- **New**: `breakFallbacks` config option for custom fallback chains. Default for paragraph mode: `["newline", "sentence"]` (matches pre-refactor behavior). Consolidated duplicate type aliases into `BreakPreferenceType`. Added schema help text.
## Change Type (select all)
- [x] Bug fix
- [x] Refactor
- [ ] Feature
- [ ] Docs
- [ ] Security hardening
- [ ] Chore/infra
## Scope (select all touched areas)
- [ ] Gateway / orchestration
- [ ] Skills / tool execution
- [ ] Auth / tokens
- [ ] Memory / storage
- [x] Integrations
- [ ] API / contracts
- [x] UI / DX
- [ ] CI/CD / infra
## Linked Issue/PR
- Related #579 (Signal chunking — same premature break behavior)
- Related #17790 (Telegram paragraph splitting — related chunker behavior)
- Related #21329 (Slack streaming truncation — may benefit from this fix)
## User-visible / Behavior Changes
- Streamed block messages now accumulate up to `maxChars` before falling back to whitespace breaks (previously fell back at `minChars`)
- Messages break at the preferred structural boundary (sentence, paragraph, newline) within the `minChars..maxChars` window, as originally intended
- No config changes needed — existing `blockStreamingChunk` settings work correctly now
- New `breakFallbacks` config option allows customizing the fallback chain per break preference
## Security Impact (required)
- New permissions/capabilities? `No`
- Secrets/tokens handling changed? `No`
- New/changed network calls? `No`
- Command/tool execution surface changed? `No`
- Data access scope changed? `No`
## Repro + Verification
### Environment
- OS: Linux (WSL2)
- Runtime/container: Node v24.13.1
- Model/provider: qwen3-8b via LMStudio (local)
- Integration/channel: MS Teams
- Relevant config: `blockStreamingChunk: { breakPreference: "sentence" }`, `blockStreamingCoalesce: { minChars: 300, maxChars: 800, idleMs: 1500 }`
### Steps
1. Configure block streaming with `breakPreference: "sentence"`, `minChars: 300`, `maxChars: 800`
2. Ask the bot to summarize emails (produces multi-line formatted output with bold headers)
3. Observe how streamed messages are split in the channel
### Expected
Messages break at sentence boundaries within the 300-800 char window, never mid-phrase.
### Actual (before fix)
Messages split at arbitrary whitespace as soon as buffer exceeds 300 chars (minChars), ignoring the 800 char maxChars window. E.g. "**Zoom" in one message and "Access**" in the next.
## Evidence
- [x] Failing test/log before + passing after
- [x] Trace/log snippets
New test file `pi-embedded-block-chunker.sentence.test.ts` reproduces the exact email content that caused bad splits. Tests cover:
- Bulk append with force flush
- Token-by-token streaming (char-by-char, the real streaming scenario)
- Paragraph mode comparison
- breakFallbacks deduplication and ordering
## Human Verification (required)
- Verified scenarios: Real email summary output in MS Teams — messages now break at sentence boundaries instead of mid-phrase
- Edge cases checked: Token-by-token streaming (char-by-char append + drain), bulk append, fence code blocks, all three break preferences
- What I did **not** verify: Discord and Telegram channels — they use the same chunker so should benefit equally
## Compatibility / Migration
- Backward compatible? `Yes`
- Config/env changes? `No` (new `breakFallbacks` option is optional with backward-compatible defaults)
- Migration needed? `No`
## Failure Recovery (if this breaks)
- How to disable/revert this change quickly: Revert the commits on `pi-embedded-block-chunker.ts`
- Files/config to restore: `src/agents/pi-embedded-block-chunker.ts`
- Known bad symptoms: Messages accumulating too long without splitting (would indicate maxChars threshold too high in user config)
## Risks and Mitigations
- Risk: Messages may accumulate slightly longer before the first split (up to `maxChars` instead of `minChars` before whitespace fallback)
- Mitigation: This is the intended behavior — `maxChars` is the configured upper bound. Users who want more frequent splits can lower `maxChars`.
Opus 4.6 assisted
Most Similar PRs
#10612: fix: trim leading blank lines on first emitted chunk only (#5530)
by 1kuna · 2026-02-06
69.9%
#20623: fix(slack): duplicate replies and missing streaming recipient params
by rahulsub-be · 2026-02-19
69.8%
#19673: fix(telegram): avoid starting streaming replies with only 1-2 words
by emanuelst · 2026-02-18
69.6%
#12064: fix: prevent chunker from truncating messages that fit within limit
by joetomasone · 2026-02-08
69.4%
#19648: fix: suppress silent-reply partial tokens during streaming
by bradleypriest · 2026-02-18
68.6%
#17316: fix: ack reaction not removed when block streaming is enabled (Tele...
by czmathew · 2026-02-15
68.4%
#12211: fix(slack): prevent duplicate message delivery via block streaming ...
by junhoyeo · 2026-02-09
67.9%
#11608: feat(slack): native streaming, Block Kit blocks, tool-aware status
by joshdavisind · 2026-02-08
67.7%
#14946: fix(webchat): accumulate text across blocks in streaming buffer
by mcaxtr · 2026-02-12
67.7%
#23226: fix(msteams): proactive messaging, EADDRINUSE fix, tool status, ada...
by TarogStar · 2026-02-22
67.6%