惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

Cyberwarzone
Cyberwarzone
T
Tenable Blog
A
Arctic Wolf
P
Palo Alto Networks Blog
P
Privacy International News Feed
S
Securelist
Security Latest
Security Latest
AWS News Blog
AWS News Blog
W
WeLiveSecurity
Application and Cybersecurity Blog
Application and Cybersecurity Blog
Apple Machine Learning Research
Apple Machine Learning Research
K
Kaspersky official blog
C
CERT Recently Published Vulnerability Notes
V
V2EX - 技术
cs.CV updates on arXiv.org
cs.CV updates on arXiv.org
Scott Helme
Scott Helme
C
Check Point Blog
TaoSecurity Blog
TaoSecurity Blog
Microsoft Azure Blog
Microsoft Azure Blog
D
DataBreaches.Net
T
Tailwind CSS Blog
T
Tor Project blog
宝玉的分享
宝玉的分享
博客园 - 司徒正美
Engineering at Meta
Engineering at Meta
Cisco Talos Blog
Cisco Talos Blog
Recent Announcements
Recent Announcements
H
Hackread – Cybersecurity News, Data Breaches, AI and More
L
Lohrmann on Cybersecurity
Jina AI
Jina AI
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
P
Proofpoint News Feed
IT之家
IT之家
S
Schneier on Security
MyScale Blog
MyScale Blog
S
Security Affairs
Simon Willison's Weblog
Simon Willison's Weblog
C
Comments on: Blog
aimingoo的专栏
aimingoo的专栏
腾讯CDC
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
博客园_首页
F
Fortinet All Blogs
Vercel News
Vercel News
cs.AI updates on arXiv.org
cs.AI updates on arXiv.org
E
Exploit-DB.com RSS Feed
A
About on SuperTechFans
Help Net Security
Help Net Security
博客园 - 【当耐特】
L
LINUX DO - 最新话题

cs updates on arXiv.org

暂无文章

Send a SCOUT First: Pre-hoc Reasoning for Adaptive Detector Allocation in Prompt-Injection Defense
[Submitted on 29 May 2026] · 2026-06-01 · via cs updates on arXiv.org

View PDF HTML (experimental)

Abstract:Prompt-injection detectors are heterogeneous: each is strong on a different slice of attacks, and none is always reliable. Yet existing systems still treat detection as a fixed single-detector pipeline, committing every request to one detector's blind spots. We reframe defense as detector allocation: given a heterogeneous pool, decide per request which detectors to run and whether to escalate to an LLM judge. Our framework SCOUT (Scalable and Controllable Outcome-prediction for Uncertainty-aware Triage) makes this decision dynamic by predicting each detector's per-sample reliability and latency from how it behaved on similar past inputs, and exposes a single safety-utility threshold to the operator (where utility bundles benign-pass rate and wall-clock). To evaluate this setting, we build SCOUT-450, a benchmark that captures the structurally complex, agent-facing injections that older prompt-injection sets under-represent. On SCOUT-450, a safety-oriented operating point reduces attack-success rate by 46% and total wall-clock by 40% relative to an always-on GPT-4o judge, at a 5.1-point benign-utility drop. SCOUT also transfers to three external benchmarks (BIPIA, IPI, and IHEval), improving the safety-utility frontier.

Submission history

From: Shuhao Zhang [view email]
[v1] Fri, 29 May 2026 04:49:20 UTC (21,540 KB)