#19113: fix: prevent duplicated text after tag stripping

by Clawborn open 2026-02-17 11:35 View on GitHub →

agents size: S trusted-contributor

Cluster: Tool Execution and Error Handling

When HTML-like tags are stripped from assistant output, the remaining text could be duplicated on the same line. Fix: Improve the deduplication logic to handle tag-stripped content correctly. Recreated from #16813 with only relevant files.  <h3>Greptile Summary</h3> Adds same-line duplicate text detection to prevent duplicated assistant output after HTML-like tags (e.g., `</final>`) are stripped. The fix introduces a new `collapseSameLineDuplicates` function that uses a regex pattern to detect and remove sentence-level duplicates that occur on the same line. **Key changes:** - New `collapseSameLineDuplicates` function with regex `/(.{10,}?[.!?])\1/g` to detect duplicated sentences (minimum 10 characters, ending in `.`, `!`, or `?`) - Modified `collapseConsecutiveDuplicateBlocks` to apply same-line deduplication before paragraph-level deduplication - Updated return statements to preserve same-line deduplication results - Added comprehensive test coverage for the new behavior - Minor import reordering (type imports before regular imports) <h3>Confidence Score: 5/5</h3> - This PR is safe to merge with minimal risk - The implementation is well-designed with appropriate safeguards (10-char minimum to avoid false positives), comprehensive test coverage, and correct integration into the existing deduplication pipeline. The regex pattern correctly uses non-greedy matching and backreferences, and the changes preserve the original text when no duplicates are found - No files require special attention <sub>Last reviewed commit: 5981e3d</sub>