惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

H
Help Net Security
T
ThreatConnect
SecWiki News
SecWiki News
F
Future of Privacy Forum
AWS News Blog
AWS News Blog
C
Cisco Blogs
A
Arctic Wolf
Vercel News
Vercel News
The GitHub Blog
The GitHub Blog
Scott Helme
Scott Helme
V
V2EX
博客园 - 叶小钗
阮一峰的网络日志
阮一峰的网络日志
K
Kaspersky official blog
G
Google Developers Blog
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
P
Privacy International News Feed
C
Cyber Attacks, Cyber Crime and Cyber Security
N
News | PayPal Newsroom
Schneier on Security
Schneier on Security
NISL@THU
NISL@THU
Microsoft Azure Blog
Microsoft Azure Blog
量子位
The Hacker News
The Hacker News
Stack Overflow Blog
Stack Overflow Blog
Security Latest
Security Latest
M
Microsoft Research Blog - Microsoft Research
Google Online Security Blog
Google Online Security Blog
博客园_首页
C
CXSECURITY Database RSS Feed - CXSecurity.com
I
InfoQ
Google DeepMind News
Google DeepMind News
Y
Y Combinator Blog
The Cloudflare Blog
Microsoft Security Blog
Microsoft Security Blog
Martin Fowler
Martin Fowler
Cisco Talos Blog
Cisco Talos Blog
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
T
Troy Hunt's Blog
F
Fox-IT International blog
S
Security @ Cisco Blogs
博客园 - 司徒正美
cs.CV updates on arXiv.org
cs.CV updates on arXiv.org
C
Comments on: Blog
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
L
LINUX DO - 最新话题
GbyAI
GbyAI
Project Zero
Project Zero
腾讯CDC
T
Tailwind CSS Blog

cs.AI updates on arXiv.org

Teaching Through Analogies: A Modular Pipeline for Educational Analogy Generation An Interpretable CF-RL-TOPSIS Fusion Model for Skills-Aware Talent Recommendation Mixture of Complementary Agents for Robust LLM Ensemble Learning to Reason Efficiently with A* Post-Training Why We Need World Models for AGI: Where LLMs Fail and How World Models May Outperform When Correct Beliefs Collapse: Epistemic Resilience of LLMs under Clinical Pressure Treatment Effect Estimation with Differentiated Networked Effect on Graph Data Φ-Noise: Training-Free Temporal Video Conditioning via Phase-Based Noise Manipulation Quantum Frog: Emergent Cooperation and Difficulty Scaling in a Quantized-Time Cooperative Game A Large-Scale Dataset and Benchmark: Do Protein-Ligand Models Learn Binding Sites or Just Binding Likelihood? From Accuracy to Auditability: A Survey of Determinism in Financial AI Systems LLM-AutoSciLab: Closed-Loop Scientific Discovery via Active Experimentation with LLMs Clustering as Reasoning: A $k$-Means Interpretation of Chain-of-Thought Graph Learning Context: Proactive Goal-Directed Intelligence via Composable Sandboxed Programs, Declarative Wiring, and Structured Interaction Assessing the Operational Viability of Foundation Models for Time Series Forecasting ChaosBench-Logic v2: Evaluating LLM Logical Reasoning over Dynamical Systems at Scale LGMT: Logic-Grounded Metamorphic Testing for Evaluating the Reasoning Reliability of LLMs Fuzzy, Neutrosophic, and Uncertain Graph Theory: Properties and Applications Signs Beat Floats: Low-Rank Double-Binary Adaptation for On-Device Fine-Tuning Diff-Instruct with Diffused Reward: Towards Principled One-step Generator RL Not All Transitions Matter: Evidence from PPO Privacy-Preserving Local Language Models for Longitudinal Data Retrieval in Chronic Dermatologic Disease: Implementation in Pemphigus Patients MAPLE: Multi-State Aggregated Policy Evaluation for AlphaZero in Imperfect-Information Games Enhancing Reliability in LLM-Based Secure Code Generation Towards trustworthy agentic AI: a comprehensive survey of safety, robustness, privacy, and system security TRACER: A Semantic-Aware Framework for Fine-Grained Contamination Detection in Code LLMs Momentum Streams for Optimizer-Inspired Transformers Rethinking Federated Unlearning via the Lens of Memorization Grammatically-Guided Sparse Attention for Efficient and Interpretable Transformers Generative OOD-regularized Model-based Policy Optimization SemanticZip: A Pilot Framework for Lossy Text Compression with LLMs as Semantic Decompressors Raon-Speech Technical Report IVR-R1: Refining Trajectories through Iterative Visual-Grounded Reasoning in Reinforcement Learning CausaLab: A Scalable Environment for Interactive Causal Discovery Toward AI Scientists Parameter Efficient Multi-Class Intelligent Scheduling for Multimodal Online Distributed Industrial Anomaly Detection How Much Thinking is Enough? Quantifying and Understanding Redundancy in LLM Reasoning Towards Evaluation Engineering: An Empirical Study of ML Evaluation Harnesses in the Wild Low-Cost Labels, Reliable Choices: Rollout-Calibrated Hyper-Heuristics for Job Shop Scheduling When Does Multi-Agent RL Improve LLM Workflows? Workflow, Scale, and Policy-Sharing Tradeoffs Balancing Fairness, Privacy, and Accuracy: A Multitask Adversarial Framework for Centralized Data-Driven Systems Cascade-KDE: Robust Time-Series Restoration under Out-of-Distribution Impulse Corruptions The Model Is Not the Product: A Dual-Pillar Architecture for Local-First Psychological Coaching Verified SHAP: Provable Bounds for Exact Shapley Values of Neural Networks Spectral Probe-Circuits: A Three-Step Recipe for Identifying Attention-Head Circuits in Pretrained Transformers Market Regime Council for Dynamic Credit Assignment in Multi-Agent LLM Decision Systems Measuring Reasoning Quality in LLMs: A Multi-Dimensional Behavioral Framework Extracting Training Data from Diffusion Language Models via Infilling Authority Inversion in LLM-Mediated Ubiquitous Systems: When Models Trust Users Over Sensors Catching The Correct Answer Trap: Characterising AI Tutor Blind Spots When Analysing Student Reasoning An Interactive Paradigm for Deep Research Exploration of Perceptual Speech Features for Clinical Decision-Support in Mental Health Care Geo-Expert: Towards Expert-Level Geological Reasoning via Parameter-Efficient Fine-Tuning Improving Labeling Consistency with Detailed Constitutional Definitions and AI-Driven Evaluation Side-by-side Comparison Amplifies Dialect Bias in Language Models Jailbreak to Protect: Buffering and Reinforcing via Temporary Jailbreaking for Safe Fine-Tuning in Large Language Models Truthful Online Preference Aggregation for LLM Fine-Tuning in Mobile Crowdsourcing Machine Psychometrics: A Mathematical Psychology of Artificial Intelligence AgentFugue: Agent Scaling for Long-Horizon Tasks through Collective Reasoning Residual Drift Dominates Contradiction in Multi-Turn Constraint Reasoning Overcoming "Physics Shock" in Earth Observation A Heteroscedastic Uncertainty Framework for PINN-based Flood Inference Breaking the Chains of Probability: Neutrosophic Logic as a New Framework for Epistemic Uncertainty in Large Language Models GlobalDentBench: A Multinational Benchmark for Evaluating LLM Clinical Reasoning in Dentistry with Expert Calibration PILOT: Policy-Informed Learned Optimization for Adaptive Deep Network Training Confidence Calibration in Large Language Models Trust but Verify: Prover-Verifier Deliberation for Selective LLM Prediction Hidden-State Privacy Has an Empty Middle High-Risk AI Systems and the Problem of Identity in the European AI Act DRIVE: Modeling Skills at the Reasoning and Interaction Levels for Web Agents under Continual Learning Second Guess: Detecting Uncertainty Through Abstention and Answer Stability in Small Language Models QUIVER: A Formal Framework for Quantifying Perturbation Propagation and Bifurcation in Compound AI Systems MobileGym: A Verifiable and Highly Parallel Simulation Platform for Mobile GUI Agent Research TriVAL: A Tri-Validation Framework for Faithful Automatic Optimization Modeling Multimodal Alignment and Preference Optimization for Zero-Shot Conditional RNA Generation Nano World Models: A Minimalist Implementation of Future Video Prediction Reason--Imagine--Act: Closed-Loop LLM Decision Making with World Models for Autonomous Driving SPACE: Unifying Symmetric and Asymmetric Routing Problems for Generalist Neural Solver Filtered Posterior Mean Collections: A Unified Framework for Analytical Models of Diffusion Generalization An Empirical Evaluation of LLM-Generated Code Security Across Prompting Methods Generative Representation Learning on Hyper-relational Knowledge Graphs via Masked Discrete Diffusion PromptAudit: Auditing Prompt Sensitivity in LLM-Based Vulnerability Detection Distributionally Robust Transfer Learning with Structurally Missing Covariates, with Application to Cross-National Cardiac Arrest Prediction Feature Lottery? A Bifurcation Theory of Concept Emergence AvalancheBench: Evaluating Enterprise Data Agents Through Latent World Recovery Batch Normalization Amplifies Memorization and Privacy Risks Agent-ToM: Learning to Monitor Autonomous LLM Agents via Theory-of-Mind Reasoning In Search of the Ingredients of Open-Endedness: Replicating Picbreeder with Large Vision-Language Models Knowledge Graph Modulated Deep Learning for Limited-Sample Clinical Data Analysis EchoDistill:Alignment Noisy-to-Clean Self-Distillation for Robust Audio LLMs Fundamental Limitation in Explaining AI Temporal Concept Drift in Legal Judgment Prediction: Neural Baselines Across Three Epochs of Ukrainian Court Decisions LC-ERD: Mining Latent Logic for Self-Evolving Reasoning via Consistency-Regulated Reward Decomposition Federated Learning over Human-Body Communication for On-Body Edge Intelligence: A Survey, Taxonomy, and BODYFED-HBC Scheduling Vignette Concept Drift Adaptation Using Self-Supervised and Reinforcement Learning In Android Malware Detection Iterative Refinement Neural Operators are Learned Fixed-Point Solvers: A Principled Approach to Spectral Bias Mitigation Inference Time Context Sparsity: Illusion or Opportunity? Automated Detection and Classification of Delusion-related Content in Naturalistic Audio Diaries Using Multi-Agent Language Models Document Classification Pattern Recognition via Information Fusion: A Systematic Review of Multimodal and Multiview Representation Approaches Hypothesis Generation and Inductive Inference in Children and Language Models Human-AI Collaboration in Science at Scale: A Global Large-scale Randomized Field Experiment MuCRASP: Multimodal Chain-of-thought Reasoning aware Structured Pruning
When the Manual Lies: A Realistic Benchmark to Evaluate MCP Poisoning Attacks for LLM Agents
Shi Liu, Xue · 2026-05-26 · via cs.AI updates on arXiv.org

View PDF HTML (experimental)

Abstract:The rise of tool-using Large Language Model (LLM) agents, standardized by protocols like the Model Context Protocol (MCP), has unlocked unprecedented autonomous execution capabilities for LLM Agents by integrating external open-domain knowledge and tools. However, this interoperability introduces a covert attack surface targeting the agent's cognitive planning layer. This paper systematically investigates Tool Description Poisoning (TDP), a novel semantic attack. In TDP, malicious instructions are not embedded in a tool's executable code, but rather covertly injected into its descriptive metadata, the very "manual" an agent relies on for secure planning and decision-making. To rigorously and systematically evaluate this emerging threat, we introduce the MCP-TDP Security Benchmark. This high-fidelity sandbox environment comprises 32 realistic, real-world test cases spanning 6 distinct risk categories. Our evaluation of 8 mainstream LLMs reveals severe vulnerabilities, with leading models like GPT-4o exhibiting a nearly 100% Attack Success Rate (ASR) in six high-risk scenarios. Furthermore, our findings demonstrate that common prompt-guardrail defenses are largely ineffective and can, counterintuitively, even be counterproductive (a phenomenon which we term the "Firewall Fallacy"). Crucially, we also propose a defense mechanism: "Reactive Self-Correction," where an agent autonomously detects and reverts its own malicious actions post-execution. This work provides the first specialized security benchmark tailored for TDP, offering essential insights for securing the cognitive and planning layers of advanced agentic systems.
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)
Cite as: arXiv:2605.24069 [cs.CR]
  (or arXiv:2605.24069v1 [cs.CR] for this version)
  https://doi.org/10.48550/arXiv.2605.24069

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Xikang Yang [view email]
[v1] Fri, 22 May 2026 08:34:48 UTC (1,201 KB)