#8504: fix: prevent false positives in isSilentReplyText for CJK content
stale
## Problem
The `isSilentReplyText` function uses `\W*$` to allow trailing non-word characters after the silent reply token. However, in JavaScript regex, `\W` matches **any non-ASCII character**, including CJK (Chinese/Japanese/Korean) characters.
This caused false positives where messages containing actual content after `NO_REPLY` were incorrectly filtered:
```
'测试 NO_REPLY 内容' => true // BUG: should be false
'好的 NO_REPLY' => true // BUG: should be false
```
## Root Cause
`\W` in JavaScript regex is equivalent to `[^a-zA-Z0-9_]`, which means all Unicode characters outside basic ASCII alphanumerics are considered 'non-word' characters.
## Fix
Replace the loose regex with a Unicode-aware pattern using `\p{P}` (Unicode punctuation category) to only allow actual punctuation around the token:
```typescript
// Before (buggy)
const suffix = new RegExp(\`\\b\${escaped}\\b\\W*$\`);
// After (fixed)
const pattern = new RegExp(\`^[\\s\\p{P}]*\${escaped}[\\s\\p{P}]*$\`, 'u');
```
## Test Results
| Input | Before | After |
|-------|--------|-------|
| `NO_REPLY` | ✅ true | ✅ true |
| `NO_REPLY.` | ✅ true | ✅ true |
| ` NO_REPLY ` | ✅ true | ✅ true |
| `测试 NO_REPLY` | ❌ true | ✅ false |
| `NO_REPLY 测试` | ❌ true | ✅ false |
| `这条消息有 NO_REPLY 内容` | ❌ true | ✅ false |
Added unit tests in `tokens.test.ts` to prevent regression.
<!-- greptile_comment -->
<h2>Greptile Overview</h2>
<h3>Greptile Summary</h3>
This PR updates `src/auto-reply/tokens.ts` to make `isSilentReplyText` Unicode-aware by replacing the previous `\W*$`-based suffix check (which treated CJK letters as “non-word”) with a `u`-flag regex that only permits whitespace and Unicode punctuation around the silent-reply token. It also adds Vitest coverage in `src/auto-reply/tokens.test.ts` for whitespace/punctuation cases plus CJK regression cases to prevent false positives.
One thing to double-check is the tightened matching semantics: the new pattern matches only when the *entire message* is token + optional whitespace/punctuation, and `\p{P}` may be narrower than desired for real-world “punctuation-like” characters. Separately, the PR introduces a new `package-lock.json`, which may be unintentional given the repo’s pnpm-first workflow.
<h3>Confidence Score: 4/5</h3>
- This PR is likely safe to merge; changes are localized and covered by targeted unit tests.
- The regex change directly addresses the reported `\W`/Unicode behavior and the added tests cover the original CJK false-positive scenarios. Main remaining risks are subtle behavior changes in what characters are considered ignorable around the token (Unicode category coverage) and the accidental addition of a new lockfile.
- src/auto-reply/tokens.ts (regex semantics) and package-lock.json (confirm it’s intended).
<!-- greptile_other_comments_section -->
<!-- /greptile_comment -->
Most Similar PRs
#19916: fix: strict silent-reply detection to prevent false positives with ...
by hayoial · 2026-02-18
89.6%
#19576: fix: tighten isSilentReplyText to match whole-text only
by aldoeliacim · 2026-02-18
82.8%
#16096: fix(i18n): use Unicode-aware word boundaries for non-ASCII language...
by PeterRosdahl · 2026-02-14
81.4%
#19675: fix(security): prevent zero-width Unicode chars from bypassing boun...
by williamzujkowski · 2026-02-18
77.5%
#17244: fix: strip TTS tags from agent replies before delivery (#14652)
by robbyczgw-cla · 2026-02-15
76.9%
#17686: fix(memory): support non-ASCII characters in FTS query tokenization
by Phineas1500 · 2026-02-16
76.8%
#16411: fix(agents): support CJK sentence punctuation in block chunker
by ciberponk · 2026-02-14
75.8%
#16733: fix(ui): avoid injected newlines when tool output is hidden
by jp117 · 2026-02-15
75.1%
#12325: fix: trim leading/trailing whitespace from outbound messages
by jordanstern · 2026-02-09
74.4%
#16962: fix: make auth error detection contextual to prevent false positives
by StressTestor · 2026-02-15
74.2%