惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

The Hacker News
The Hacker News
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
雷峰网
雷峰网
人人都是产品经理
人人都是产品经理
Recent Announcements
Recent Announcements
D
DataBreaches.Net
P
Proofpoint News Feed
V
Visual Studio Blog
J
Java Code Geeks
Recorded Future
Recorded Future
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
F
Full Disclosure
CTFtime.org: upcoming CTF events
CTFtime.org: upcoming CTF events
The GitHub Blog
The GitHub Blog
Engineering at Meta
Engineering at Meta
C
Cybersecurity and Infrastructure Security Agency CISA
V
Vulnerabilities – Threatpost
罗磊的独立博客
Jina AI
Jina AI
博客园 - 【当耐特】
C
CERT Recently Published Vulnerability Notes
G
GRAHAM CLULEY
Y
Y Combinator Blog
L
LangChain Blog
L
LINUX DO - 热门话题
宝玉的分享
宝玉的分享
月光博客
月光博客
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
H
Help Net Security
云风的 BLOG
云风的 BLOG
C
CXSECURITY Database RSS Feed - CXSecurity.com
博客园_首页
A
About on SuperTechFans
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
Latest news
Latest news
T
Threatpost
T
Tenable Blog
有赞技术团队
有赞技术团队
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
Stack Overflow Blog
Stack Overflow Blog
C
Cisco Blogs
C
Check Point Blog
T
Tor Project blog
T
Threat Research - Cisco Blogs
T
The Exploit Database - CXSecurity.com
S
Schneier on Security
美团技术团队
I
Intezer
S
Securelist
AWS News Blog
AWS News Blog

cs.CR updates on arXiv.org

On-Device Interpretable Tsetlin Machine-Based Intrusion Detection for Secure IoMT LivePI: More Realistic Benchmarking of Agents Against Indirect Prompt Injection Backdoor Channels Hidden in Latent Space: Cryptographic Undetectability in Modern Neural Networks Large Language Models for Agentic NetOps and AIOps: Architectures, Evaluation, and Safety Red-Teaming Agent Execution Contexts: Open-World Security Evaluation on OpenClaw Defense effectiveness across architectural layers: a mechanistic evaluation of persistent memory attacks on stateful LLM agents From Specification to Deployment: Empirical Evidence from a W3C VC + DID Trust Infrastructure for Autonomous Agents When Embedding-Based Defenses Fail: Rethinking Safety in LLM-Based Multi-Agent Systems Block-wise Codeword Embedding for Reliable Multi-bit Text Watermarking TwoHamsters: Benchmarking Multi-Concept Compositional Unsafety in Text-to-Image Models Hijacking Text Heritage: Hiding the Human Signature through Homoglyphic Substitution Like a Hammer, It Can Build, It Can Break: Large Language Model Uses, Perceptions, and Adoption in Cybersecurity Operations on Reddit StegoStylo: Squelching Stylometric Scrutiny through Steganographic Stitching CrossGuard: Safeguarding MLLMs against Joint-Modal Implicit Malicious Attacks Feedback Lunch: Learned Feedback Codes for Secure Communications Noise Aggregation Analysis Driven by Small-Noise Injection: Efficient Membership Inference for Diffusion Models A First Look at the Security Issues in the Model Context Protocol Ecosystem Formalizing the Safety, Security, and Functional Properties of Agentic AI Systems MEASER: Malware embedding attacks on open-source LLMs ADMIT: Few-shot Knowledge Poisoning Attacks on RAG-based Fact Checking Fall into a Pit, Gain in a Wit: Cognitive-Guided Harmful Meme Detection via Misjudgment Risk Pattern Retrieval When Search Goes Wrong: Red-Teaming Web-Augmented Large Language Models A2AS: Agentic AI Runtime Security and Self-Defense Differentially Private Synthetic Text Generation for Retrieval-Augmented Generation (RAG) From surveillance to signalling: escalation channels as environmental controls for agentic AI Quantitative Certification of Agentic Tool Selection STAC: When Innocent Tools Form Dangerous Chains to Jailbreak LLM Agents Fingerprinting LLMs via Prompt Injection Federated Spatiotemporal Graph Learning for Passive Attack Detection in Smart Grids Guidance Watermarking for Diffusion Models SecureVibeBench: Benchmarking Secure Vibe Coding of AI Agents via Reconstructing Vulnerability-Introducing Scenarios xOffense: An Autonomous Multi-Agent Framework for Penetration Testing with Domain-Adapted Large Language Models Hammer and Anvil: Toward a Theory of Backdoors in Federated Learning Neuro-Symbolic AI for Cybersecurity: State of the Art, Challenges, and Opportunities Tell-Tale Watermarks for Explanatory Reasoning in Synthetic Media Forensics Between a Rock and a Hard Place: The Tension Between Ethical Reasoning and Safety Alignment in LLMs A Comprehensive Guide to Differential Privacy: From Theory to User Expectations Enabling Transparent Cyber Threat Intelligence Combining Large Language Models and Domain Ontologies Unveiling Unicode's Unseen Underpinnings in Undermining Authorship Attribution Searching for Privacy Risks in LLM Agents via Simulation Exact Verification of Graph Neural Networks with Incremental Constraint Solving SPRINT: Robust Model Attribution of Generated Images via Secret Pixel Reconstruction Majority Bit-Aware Watermarking For Large Language Models Coward: Collision-based OOD Watermarking for Practical Proactive Federated Backdoor Detection Prompt to Pwn: Automated Exploit Generation for Smart Contracts Activation-Guided Local Editing for Jailbreaking Attacks Random Walk Learning and the Pac-Man Attack ExCyTIn-Bench: Evaluating LLM agents on Cyber Threat Investigation White-Basilisk: A Hybrid Model for Code Vulnerability Detection Intrinsic Fingerprint of LLMs: Continue Training is NOT All You Need to Steal A Model! InvisibleInk: High-Utility and Low-Cost Text Generation with Differential Privacy Logit-Gap Steering: A Forward-Pass Diagnostic for Alignment Robustness Toward Principled LLM Safety Testing: Solving the Jailbreak Oracle Problem Exploring the Secondary Risks of Large Language Models Benchmarking Misuse Mitigation Against Covert Adversaries Efficient Preimage Approximation for Neural Network Certification Practical Adversarial Attacks on Stochastic Bandits via Fake Data Injection PARASITE: Conditional System Prompt Poisoning to Hijack LLMs Unlearning Isn't Deletion: Investigating Reversibility of Machine Unlearning in LLMs Secure LLM Fine-Tuning via Safety-Aware Probing Can Large Language Models Really Recognize Your Name? PoLO: Proof-of-Learning and Proof-of-Ownership at Once with Chained Watermarking Unveiling the Black Box: A Multi-Layer Framework for Explaining Reinforcement Learning-Based Cyber Agents A Survey on the Safety and Security Threats of Computer-Using Agents: JARVIS or Ultron? AutoRAN: Automated Hijacking of Safety Reasoning in Large Reasoning Models Remote Rowhammer Attack using Adversarial Observations on Federated Learning Clients Open Challenges in Multi-Agent Security: Towards Secure Systems of Interacting AI Agents DiffMI: Breaking Face Recognition Privacy via Diffusion-Driven Training-Free Model Inversion Chronology of Multi-Agent Interactions for Provenance of Evolving Information Progent: Securing AI Agents with Privilege Control Gungnir: Exploiting Stylistic Features in Images for Backdoor Attacks on Diffusion Models DeePen: Penetration Testing for Audio Deepfake Detection Detecting Malicious Concepts without Image Generation in AI-Generated Content (AIGC) How Vulnerable Is My Learned Policy? Universal Adversarial Perturbation Attacks On Modern Behavior Cloning Policies Imitation Game for Adversarial Disillusion with Chain-of-Thought Reasoning in Generative AI PromptGuard: Soft Prompt-Guided Unsafe Content Moderation for Text-to-Image Models A Multiparty Homomorphic Encryption Approach to Confidential Federated Kaplan Meier Survival Analysis Red-Teaming Text-to-Image Models via In-Context Experience Replay and Semantic-Preserving Prompt Rewriting DeTrigger: A Gradient-Centric Approach to Backdoor Attack Mitigation in Federated Learning Privacy Leakage via Output Label Space and Differentially Private Continual Learning ARQ: A Mixed-Precision Quantization Framework for Accurate and Certifiably Robust DNNs Faster-GCG: Efficient Discrete Optimization Jailbreak Attacks against Aligned Large Language Models CoreGuard: Safeguarding Foundational Capabilities of LLMs Against Model Stealing in Edge Deployment Power-Softmax: Towards Secure LLM Inference over Encrypted Data FlipAttack: Jailbreak LLMs via Flipping Hypnopaedia-Aware Machine Unlearning via Psychometrics of Artificial Mental Imagery Anomaly Detection from a Tensor Train Perspective Survival of the Cheapest: Cost-Aware Hardware Adaptation for Adversarial Robustness Convergent Differential Privacy Analysis for General Federated Learning Improving Clean Accuracy via a Tangent-Space Perspective on Adversarial Training The AI risk repository: A meta-review, database, and taxonomy of risks from artificial intelligence Towards Agentic Runtime Healing Verification of Machine Unlearning is Fragile Aggressive or Imperceptible, or Both: Network Pruning Assisted Hybrid Byzantines in Federated Learning Whispers in the Machine: Confidentiality in Agentic Systems MalPurifier: Enhancing Android Malware Detection with Adversarial Purification against Evasion Attacks Towards Adaptive, Learning-Based Security in Decentralized Applications Approximate and Weighted Data Reconstruction Attack in Federated Learning Can Blockchains Reliably Train Machine Learning Models? Vendor-Conditioned Contrastive Learning for Predicting Organizational Cyber Threat Targets
Training Meta-Surrogate Model for Transferable Adversarial Attack
Yunxiao Qin, Yuanhao Xiong, Jinfeng Yi, Cho-Jui Hsieh · 2021-09-05 · via cs.CR updates on arXiv.org

We consider adversarial attacks to a black-box model when no queries are allowed. In this setting, many methods directly attack surrogate models and transfer the obtained adversarial examples to fool the target model. Plenty of previous works investigated what kind of attacks to the surrogate model can generate more transferable adversarial examples, but their performances are still limited due to the mismatches between surrogate models and the target model. In this paper, we tackle this problem from a novel angle -- instead of using the original surrogate models, can we obtain a Meta-Surrogate Model (MSM) such that attacks to this model can be easier transferred to other models? We show that this goal can be mathematically formulated as a well-posed (bi-level-like) optimization problem and design a differentiable attacker to make training feasible. Given one or a set of surrogate models, our method can thus obtain an MSM such that adversarial examples generated on MSM enjoy eximious transferability. Comprehensive experiments on Cifar-10 and ImageNet demonstrate that by attacking the MSM, we can obtain stronger transferable adversarial examples to fool black-box models including adversarially trained ones, with much higher success rates than existing methods. The proposed method reveals significant security challenges of deep models and is promising to be served as a state-of-the-art benchmark for evaluating the robustness of deep models in the black-box setting.