#5924: fix(security): add advanced multi-turn attack detection
Cluster:
Security Enhancements and Fixes
## Summary
Adds stateful detection for sophisticated multi-turn prompt injection attacks.
**Part 3 of 3** from Operation CLAW FORTRESS security hardening (split from #5863 for easier review).
## New Files
| File | Purpose |
|------|---------|
| `src/security/injection-detection.ts` | Attack detection logic |
| `src/security/injection-detection.test.ts` | Comprehensive tests |
## Attack Types Detected
| Type | Description |
|------|-------------|
| `many_shot` | 3+ examples in message building a pattern |
| `crescendo` | Progressive trust-building across turns |
| `persona_hijack` | DAN, roleplay, developer mode injection |
| `cot_hijack` | Chain-of-thought manipulation |
| `authority_spoof` | Fake [ADMIN], [SYSTEM] markers |
| `false_memory` | Fabricated prior agreements |
| `indirect` | Hidden in code/HTML comments |
## API
\`\`\`typescript
// Quick check for obvious attacks
isLikelyAttack(content: string): boolean
// Full analysis with confidence scoring
detectAdvancedInjection(ctx: {
currentMessage: string;
recentHistory?: string[];
}): InjectionDetectionResult
\`\`\`
## ZeroLeaks Findings Addressed
- Many-shot priming (3.2, 3.9)
- Crescendo attacks (3.3, 3.10)
- Persona injection (3.6, 4.1)
- Authority spoofing (4.1)
## Test Plan
- [x] Unit tests for all attack types
- [x] Multi-turn conversation tests
- [x] Regression tests with ZeroLeaks payloads
🔒 Generated with [Claude Code](https://claude.ai/code)
<!-- greptile_comment -->
<h2>Greptile Overview</h2>
<h3>Greptile Summary</h3>
Adds a new `src/security/injection-detection.ts` module that detects several prompt-injection patterns (single-message and multi-turn), producing a `detected` flag, `attackTypes`, `confidence`, and human-readable `details`. Adds `src/security/injection-detection.test.ts` with unit/regression tests covering each attack type plus multi-turn scenarios, including a small suite of "ZeroLeaks" payload regressions.
This fits into the repo’s broader security hardening by providing a standalone classifier that callers can use either as a fast-path (`isLikelyAttack`) or a richer analysis (`detectAdvancedInjection`) that can incorporate recent conversation history.
<h3>Confidence Score: 3/5</h3>
- Mostly safe to merge, but there is a real determinism bug risk in regex matching that could cause inconsistent detection results.
- Core logic is straightforward and well-tested, but `hasMatch` relies on `RegExp.test` and some pattern sets include global regexes; this can lead to stateful `lastIndex` behavior and flaky/non-deterministic detections depending on call order.
- src/security/injection-detection.ts
<!-- greptile_other_comments_section -->
<sub>(5/5) You can turn off certain types of comments like style [here](https://app.greptile.com/review/github)!</sub>
**Context used:**
- Context from `dashboard` - CLAUDE.md ([source](https://app.greptile.com/review/custom-context?memory=fd949e91-5c3a-4ab5-90a1-cbe184fd6ce8))
- Context from `dashboard` - AGENTS.md ([source](https://app.greptile.com/review/custom-context?memory=0d0c8278-ef8e-4d6c-ab21-f5527e322f13))
<!-- /greptile_comment -->
Most Similar PRs
#8086: feat(security): Add prompt injection guard rail
by bobbythelobster · 2026-02-03
79.6%
#10559: feat(security): add plugin output scanner for prompt injection dete...
by DukeDeSouth · 2026-02-06
79.5%
#5923: fix(security): add input encoding detection and obfuscation decoder
by dan-redcupit · 2026-02-01
77.9%
#7346: Security: add hardening module and secure-bot extension
by AlphonseC · 2026-02-02
76.9%
#20106: security: MAESTRO threat mitigations (LM-001, SC-003, AF-005, DI-00...
by kenhuangus · 2026-02-18
75.2%
#6405: feat(security): Add HTTP API security hooks for plugin scanning
by masterfung · 2026-02-01
74.0%
#10514: Security: harden AGENTS.md with gateway, prompt injection, and supp...
by catpilothq · 2026-02-06
73.6%
#17273: feat: add security-guard extension — agentic safety guardrails
by miloudbelarebia · 2026-02-15
73.6%
#23174: feat(security): credential leak prevention — exfiltration patterns,...
by ihsanmokhlisse · 2026-02-22
73.4%
#6486: feat(security): add exec command denylist for defense-in-depth
by nia-agent-cyber · 2026-02-01
73.2%