HarmRLVR: Weaponizing Verifiable Rewards for Harmful LLM Alignment - 惯性聚合

推荐订阅源

Java Code Geeks

Cyber Security Advisories - MS-ISAC

Recent Announcements

博客园 - 三生石上(FineUI控件)

博客园_首页

钛媒体：引领未来商业与生活新知

人人都是产品经理

奇客Solidot–传递最新科技情报

aimingoo的专栏

MIT News - Artificial intelligence

About on SuperTechFans

The Blog of Author Tim Ferriss

Blog — PlanetScale

OSCHINA 社区最新新闻

Engineering at Meta

博客园 - 叶小钗

Hugging Face - Blog

Google DeepMind News

The Cloudflare Blog

freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More

CTFtime.org: upcoming CTF events

博客园 - 【当耐特】

博客园 - Franky

Help Net Security

Stack Overflow Blog

阮一峰的网络日志

Check Point Blog

CERT Recently Published Vulnerability Notes

cs.AI updates on arXiv.org

Cisco Talos Blog

Hackread – Cybersecurity News, Data Breaches, AI and More

Darknet – Hacking Tools, Hacker News & Cyber Security

博客园 - 司徒正美

Microsoft Security Blog

cs.CR updates on arXiv.org

Attribute Inference from Interactive Targeted Ads QoS-Aware Token Scheduling and Private Data Valuation for Multi-Modal Agentic Networks TrustedARI: Towards Trust-Native Agentic Routing Infrastructure for Agentic AI AIChilles: Automatically Uncovering Hidden Weaknesses in AI-Evolved Systems Looking Is Not Picking: An Attention-Segment Account of Tool-Selection Failures in LLM Agents A Security Analysis of Long-Horizon Agentic AI Systems: Threats, Evaluation, and Framework Development Is Your Agent Playing Dead? Deployed LLM Agents Exhibit Constraint-Evasive Fabrication and Thanatosis AutoDojo: Adaptive Attacks Expose Superficial Defenses and User-Underspecification Limits in LLM Agents Benign in Isolation, Harmful in Composition: Security Risks in Agent Skill Ecosystems Defending against Adaptive Prompt Injection Attacks via Reasoning-enabled Task Alignment CmdNeedle: Measuring the Incompleteness of Command Denylists for AI Agents FragFuse: Bypassing Access Control of Large Language Model Agents via Memory-Based Query Fragmentation and Fusion AnonShield: Scalable On-Premise Pseudonymization for CSIRT Vulnerability Data Odds Law: The Decomposition Algebra On How Intelligence Organizes Itself to Solve Difficult Problems Reliably Snyk VulnBench JS 1.0: Can LLMs Find the Same Bugs Twice? GAS-Leak-LLM: Genetic Algorithm-Based Suffix Optimization for Black-Box LLM Jailbreaking Let Them Steal: Trapping Large Language Model Extraction Attacks with Knowledge Honeypot SkillVetBench: LLM-as-Judge for Multi-Dimensional Security Risk Evaluation in Open-Source LLM Agent Skills MASCOT-Android: A Curated Dataset and Automated Collection Pipeline for Android Malware Source Code Specimens SPARK: Security Knowledge Priming and Representation-Guided Knowledge Activation for LLM-based Secure Code Generation The Proxy Knows Too Much: Sealing LLM API Routers with Attested TEEs Automated jailbreak attack targeting multiple defense strategies The Vision Encoder as a Privacy Boundary: Visual-Token Side Channels in Encoder-Free Vision-Language Models Vision-Encoder Behavioral Fingerprints of Image-to-Image Generative Models: A Training-Paradigm-Driven Taxonomy of Six Commercial APIs How Much Can We Trust LLM Search Agents? Measuring Endorsement Vulnerability to Web Content Manipulation Your "Pro" LLM Subscription May Actually Be "Free": Exposing Fingerprint Spoofing Risks in LLM Inference Services DoubtProbe: Black-Box Jailbreak Defense via Structural Verification and Semantic Auditing Censorship-Resistant Sealed-Bid Auctions on Blockchains Differentially Private Submodular Maximization with a Knapsack Constraint Continual Backdoor Training in IoT/CPS Security Engineering of OpenClaw: Analyzing Attack Surface Expansion and Trust-Boundary Violations Semantic Integrity Failures in Document-to-LLM Supply Chains BT-MTD: Bus Traversal-based Moving Target Defense for Smart Grid Fuzzy PSI from Symmetric Primitives with Exact Logarithmic Dependence on Distance Threshold Data-Centric Benchmarking of Exploit Generation in LLMs: Understanding the Impact of Fine-Tuning VLALeaks: Membership Inference Attacks against Vision-Language-Action Models Robust and Precise Application Fingerprinting on 5G Physical Uplink Channel LLM: LSTM Look-Ahead Moving Target Defense Based on Historical Malicious Scan Cross-Silo De-Anonymization Under Local Differential Privacy: Threat Model, Phase Transition, and Coordination Necessity The Audit Gap in Blockchain Security: A Four-Year Empirical Study of Public Audit Findings and Real-World Exploit Incidents In-DRAM Signature Generation Using Simultaneous Multiple-Row Activation: An Experimental Study of Off-The-Shelf DRAM Chips Model Stealing Through the Lens of Model Multiplicity Greedy Coordinate Diffusion: Effective and Semantically Coherent Adversarial Attacks via Diffusion Guidance Multi-tier Differential Private Query Release Your Privacy My Cloak: Backdoor Attacks on Differentially Private Federated Learning FEnc$^2$: Unifying Data Packing for Efficient Private Inference via Convolution and Architecture-Aware Fragment Encoding Secure and Low-Latency IoT Analytics Using an Edge-Based Streaming Architecture Robust and Automated Reconfiguration of Byzantine Wide-Area Replication did:crdt: Coordination-Free Decentralised Identifiers via Signed CRDTs CoBRA: A Universal Strategyproof Confirmation Protocol for Quorum-based Proof-of-Stake Blockchains AttackonCTF: Defending Hardware Security Competition Benchmarks in the Age of LLMs FuseChain: Runtime Evidence Reconstruction for Software Supply-Chain Attacks Stickel-type key exchange with hidden subspaces New Ideas on a New Old Type of Cipher:The Mixed-Radix One-Time Pad The Anatomy of Scam Scenarios: Large-Scale Characterization and Conversation-Aware Detection Invisible Manipulation Channels in AI-Assisted Financial Advisory: Implications for Market Integrity and Regulatory Design Scalable Malware Family Classification Using Quantum Kernel Based Machine Learning Dynamic Malicious Skills in Agentic AI From Refusal Geometry to Safety Geometry: Harmfulness--Refusal Coupling under Dynamic Adversarial Fine-Tuning MIPSBLEED: Uncovering Microarchitectural Timing Leaks in Pervasive Embedded Processors MPX: A Unified Systolic Array for Matrix and Polynomial Multiplication Transferable Self-Evolving Playbooks for Agentic Security Auditing A Formal Resilience Framework for Cyber-Physical Embodied Systems under Device-Level Cyberattacks Measurement Study of Post-Quantum Readiness of Internet: 2026 A data-driven security quantification framework for IoT-based systems SoK: Taxonomizing the Low-Level Attack Surface of Modern Web Browsers KnowML: Improving Generalization of ML-NIDS with Attack Knowledge Graphs From Third-Party to First-Party: Measuring and Protecting Against Modern Web Tracking Mechanisms The Ghosts of Polymarket: When Off-Chain Matches Meet On-Chain Reverts Di5Guise: 5G Privacy with vSIM High-Performance Pipelined NTT Accelerators with Homogeneous Digit-Serial Modulo Arithmetic obliv-clang: Real-World Oblivious Programming in C++ Quantum Futures Interactive: A Live Demonstration of Post-Quantum Blockchain Security, Infrastructure Tradeoffs, and Sustainable Distributed Trust On MDS Property of g-Circulant Matrices Calyx: Privacy-Preserving Multi-Token Optimistic-Rollup Protocol MIRAGE: Misleading Retrieval-Augmented Generation via Black-box and Query-agnostic Poisoning Attacks Taint-Based Code Slicing for LLMs-based Malicious NPM Package Detection Cryptanalysis of LDPC-Based Pseudorandom Error-Correcting Codes The Coverage Gap: Chile's Cyber Disclosure Framework versus the USA, EU and UK AutoSUT: The Environment Semantics Gap in Structured CTI for Adversary Emulation From Agent Traces to Trust: Evidence Tracing and Execution Provenance in LLM Agents Learn from Your Mistakes: Tree-like Self-Play for Secure Code LLMs Send a SCOUT First: Pre-hoc Reasoning for Adaptive Detector Allocation in Prompt-Injection Defense QSignAI: Quantum-Randomness-Seeded Identity Signatures at the Intersection of AI for Science and Science for AI Intent-based Security Management Using the TM Forum TR292I Security Ontology Code as a Weapon: A Consensus-Labeled Prompt Bank for Measuring Coding-Model Compliance with Malicious-Code Requests Cordyceps: Covert Control Attacks on LLMs via Data Poisoning SAMark: A Self-Anchored Text Watermarking with Paragraph-Level Paraphrase Robustness Mechanistic origins of catastrophic forgetting: why RL preserves circuits better than SFT? Red-Teaming Agent Execution Contexts: Open-World Security Evaluation on OpenClaw From Specification to Deployment: Empirical Evidence from a W3C VC + DID Trust Infrastructure for Autonomous Agents Parallel Test-Time Scaling with Multi-Sequence Verifiers MUZZLE: Adaptive Agentic Red-Teaming of Web Agents Against Indirect Prompt Injection Attacks PromptScreen: Efficient Jailbreak Mitigation Using Semantic Linear Classification in a Multi-Staged Pipeline AI Kill Switch for malicious web-based LLM agent Is Your Prompt Safe? Investigating Prompt Injection Attacks Against Open-Source LLMs

HarmRLVR: Weaponizing Verifiable Rewards for Harmful LLM Alignment

[Submitted on 17 Oct 2025 (v1), last revised 14 Jun 2026 (this v · 2026-06-16 · via cs.CR updates on arXiv.org

此内容由惯性聚合(RSS阅读器)自动聚合整理，仅供阅读参考。原文来自 — 版权归原作者所有。