← Back to PRs

#5924: fix(security): add advanced multi-turn attack detection

by dan-redcupit open 2026-02-01 03:47 View on GitHub →
## Summary Adds stateful detection for sophisticated multi-turn prompt injection attacks. **Part 3 of 3** from Operation CLAW FORTRESS security hardening (split from #5863 for easier review). ## New Files | File | Purpose | |------|---------| | `src/security/injection-detection.ts` | Attack detection logic | | `src/security/injection-detection.test.ts` | Comprehensive tests | ## Attack Types Detected | Type | Description | |------|-------------| | `many_shot` | 3+ examples in message building a pattern | | `crescendo` | Progressive trust-building across turns | | `persona_hijack` | DAN, roleplay, developer mode injection | | `cot_hijack` | Chain-of-thought manipulation | | `authority_spoof` | Fake [ADMIN], [SYSTEM] markers | | `false_memory` | Fabricated prior agreements | | `indirect` | Hidden in code/HTML comments | ## API \`\`\`typescript // Quick check for obvious attacks isLikelyAttack(content: string): boolean // Full analysis with confidence scoring detectAdvancedInjection(ctx: { currentMessage: string; recentHistory?: string[]; }): InjectionDetectionResult \`\`\` ## ZeroLeaks Findings Addressed - Many-shot priming (3.2, 3.9) - Crescendo attacks (3.3, 3.10) - Persona injection (3.6, 4.1) - Authority spoofing (4.1) ## Test Plan - [x] Unit tests for all attack types - [x] Multi-turn conversation tests - [x] Regression tests with ZeroLeaks payloads 🔒 Generated with [Claude Code](https://claude.ai/code) <!-- greptile_comment --> <h2>Greptile Overview</h2> <h3>Greptile Summary</h3> Adds a new `src/security/injection-detection.ts` module that detects several prompt-injection patterns (single-message and multi-turn), producing a `detected` flag, `attackTypes`, `confidence`, and human-readable `details`. Adds `src/security/injection-detection.test.ts` with unit/regression tests covering each attack type plus multi-turn scenarios, including a small suite of "ZeroLeaks" payload regressions. This fits into the repo’s broader security hardening by providing a standalone classifier that callers can use either as a fast-path (`isLikelyAttack`) or a richer analysis (`detectAdvancedInjection`) that can incorporate recent conversation history. <h3>Confidence Score: 3/5</h3> - Mostly safe to merge, but there is a real determinism bug risk in regex matching that could cause inconsistent detection results. - Core logic is straightforward and well-tested, but `hasMatch` relies on `RegExp.test` and some pattern sets include global regexes; this can lead to stateful `lastIndex` behavior and flaky/non-deterministic detections depending on call order. - src/security/injection-detection.ts <!-- greptile_other_comments_section --> <sub>(5/5) You can turn off certain types of comments like style [here](https://app.greptile.com/review/github)!</sub> **Context used:** - Context from `dashboard` - CLAUDE.md ([source](https://app.greptile.com/review/custom-context?memory=fd949e91-5c3a-4ab5-90a1-cbe184fd6ce8)) - Context from `dashboard` - AGENTS.md ([source](https://app.greptile.com/review/custom-context?memory=0d0c8278-ef8e-4d6c-ab21-f5527e322f13)) <!-- /greptile_comment -->

Most Similar PRs