惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

H
Help Net Security
T
ThreatConnect
SecWiki News
SecWiki News
F
Future of Privacy Forum
AWS News Blog
AWS News Blog
C
Cisco Blogs
A
Arctic Wolf
Vercel News
Vercel News
The GitHub Blog
The GitHub Blog
Scott Helme
Scott Helme
V
V2EX
博客园 - 叶小钗
阮一峰的网络日志
阮一峰的网络日志
K
Kaspersky official blog
G
Google Developers Blog
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
P
Privacy International News Feed
C
Cyber Attacks, Cyber Crime and Cyber Security
N
News | PayPal Newsroom
Schneier on Security
Schneier on Security
NISL@THU
NISL@THU
Microsoft Azure Blog
Microsoft Azure Blog
量子位
The Hacker News
The Hacker News
Stack Overflow Blog
Stack Overflow Blog
Security Latest
Security Latest
M
Microsoft Research Blog - Microsoft Research
Google Online Security Blog
Google Online Security Blog
博客园_首页
C
CXSECURITY Database RSS Feed - CXSecurity.com
I
InfoQ
Google DeepMind News
Google DeepMind News
Y
Y Combinator Blog
The Cloudflare Blog
Microsoft Security Blog
Microsoft Security Blog
Martin Fowler
Martin Fowler
Cisco Talos Blog
Cisco Talos Blog
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
T
Troy Hunt's Blog
F
Fox-IT International blog
S
Security @ Cisco Blogs
博客园 - 司徒正美
cs.CV updates on arXiv.org
cs.CV updates on arXiv.org
C
Comments on: Blog
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
L
LINUX DO - 最新话题
GbyAI
GbyAI
Project Zero
Project Zero
腾讯CDC
T
Tailwind CSS Blog

cs.AI updates on arXiv.org

Why We Need World Models for AGI: Where LLMs Fail and How World Models May Outperform From Accuracy to Auditability: A Survey of Determinism in Financial AI Systems Accelerating Long-Tail Generation in Synchronous RLHF Training via Adaptive Tensor Parallelism DRIVE: Modeling Skills at the Reasoning and Interaction Levels for Web Agents under Continual Learning Authority Inversion in LLM-Mediated Ubiquitous Systems: When Models Trust Users Over Sensors A Sober Look at Agentic Misalignment in Automated Workflows Stop Comparing LLM Agents Without Disclosing the Harness Spacetime Formation under Requirements: Contextual Realization and Form-Dependent Probability Reason--Imagine--Act: Closed-Loop LLM Decision Making with World Models for Autonomous Driving Right-Sizing Communication and Recommendation Set Size in AI-Assisted Search Beyond Predefined Learning Objects: A Thinking-Learning Interaction Model for Up-to-Date Autonomous Robot Learning SkillEvolBench: Benchmarking the Evolution from Episodic Experience to Procedural Skills How Much Thinking is Enough? Quantifying and Understanding Redundancy in LLM Reasoning Methods for Formal Verification of Agent Skills: Three Layers Toward a Mechanically Checkable Capability-Containment Proof EvoCode-Bench: Evaluating Coding Agents in Multi-Turn Iterative Interactions Context: Proactive Goal-Directed Intelligence via Composable Sandboxed Programs, Declarative Wiring, and Structured Interaction Quantum Frog: Emergent Cooperation and Difficulty Scaling in a Quantized-Time Cooperative Game EvoSci: A Bio-Inspired Multi-Agent Framework for the Evolution of Scientific Discovery Low-Cost Labels, Reliable Choices: Rollout-Calibrated Hyper-Heuristics for Job Shop Scheduling Toward Reliable Design of LLM-Enabled Agentic Workflows: Optimizing Latency-Reliability-Cost Tradeoffs MAPLE: Multi-State Aggregated Policy Evaluation for AlphaZero in Imperfect-Information Games Machine Psychometrics: A Mathematical Psychology of Artificial Intelligence EPPC-OASIS: Ontology-Aware Adaptation and Structured Inference Refinement for Electronic Patient-Provider Communication Mining in Secure Messages LC-ERD: Mining Latent Logic for Self-Evolving Reasoning via Consistency-Regulated Reward Decomposition Towards trustworthy agentic AI: a comprehensive survey of safety, robustness, privacy, and system security Breaking the Chains of Probability: Neutrosophic Logic as a New Framework for Epistemic Uncertainty in Large Language Models QUIVER: A Formal Framework for Quantifying Perturbation Propagation and Bifurcation in Compound AI Systems BODHI: Precise OS Kernel Specification Inference HyperGuide: Hyperbolic Guidance for Efficient Multi-Step Reasoning in Large Language Models BoxLitE: A Faithful Knowledge Base Embedding Based on Convex Optimization Residual Drift Dominates Contradiction in Multi-Turn Constraint Reasoning MEMOR-E: In-Context and Fine-Tuned LLM Personalization for Alzheimer's Assistive Robotics When Does Multi-Agent RL Improve LLM Workflows? Workflow, Scale, and Policy-Sharing Tradeoffs Operationalizing Reconstructive Authority: Runtime Construction, Dependency Resolution, and Execution Gating in Autonomous Agent Systems LGMT: Logic-Grounded Metamorphic Testing for Evaluating the Reasoning Reliability of LLMs Saturating Scaling Laws for Equational Discovery: A Phenomenology of Growth Dynamics in Three Toy Substrates with Two Real-World Replications A Dynamical Framework for Cognitive Processes Based on Transformations and Semantic Equivalence Neuro-Inspired Inverse Learning for Planning and Control Practical Quantum CIM Empowerment via All-Domestic-Core Agentic Large Model When Correct Beliefs Collapse: Epistemic Resilience of LLMs under Clinical Pressure Confidence Calibration in Large Language Models Inference Time Context Sparsity: Illusion or Opportunity? Fuzzy, Neutrosophic, and Uncertain Graph Theory: Properties and Applications In Search of the Ingredients of Open-Endedness: Replicating Picbreeder with Large Vision-Language Models CoSPlay: Cooperative Self-Play at Test-Time with Self-Generated Code and Unit Test SPACENUM: Revisiting Spatial Numerical Understanding in VLMs SSDAU: Structured Semantic Data Augmentation for Joint Entity and Relation Extraction EDGE-OPD: Internalizing Privileged Context with Evidence Guided On-Policy Distillation ETCHR: Editing To Clarify and Harness Reasoning MedExpMem: Adapting Experience Memory for Differential Diagnosis SimInsert: Seamless Video Object Insertion via Regional Sparse Attention Fusion KPI2KVI: A Multi Agent Workflow for Calculating Key Value Indicators from Service Descriptions Staging by the Book: Automatic Sleep Stage Classification Using Scoring Rules The Misattribution Gap: When Memory Poisoning Looks Like Model Failure in Agentic AI Systems Human-Centered Learning Mechanics: A Dynamical Framework for Entropy-Regulated Representation Learning Computable Fairness: Boltzmann-Softmax Control for AI Resource Allocation Anytime Training with Schedule-Free Spectral Optimization One-Forcing: Towards Stable One-Step Autoregressive Video Generation Solving the Aircraft Disassembly Scheduling Problem Beyond Binary Edits Robust Multimodal Knowledge Editing with Adversarial Subspace Alignment CHASD: Language Increment-Calibrated Contrastive Decoding against Hallucination in LVLMs ObjectCache: Layerwise Object-Storage Retrieval for KV Cache Reuse When Do LLMs Reason? A Dynamical Systems View via Entropy Phase Transitions FastKernels: Benchmarking GPU Kernel Generation in Production A mathematical theory of balancing relational generalization and memorization Worse than Random: The Importance of a Baseline for Unsupervised Feature Selection DiLaDiff: Distilled Latent-Augmented Diffusion for Language Modeling LLM Code Smells: A Taxonomy and Detection Approach RAG4Outcome: A Retrieval-Augmented Multimodal Framework for Prognostic Prediction in Chronic Osteomyelitis Tensor Cache: Eviction-conditioned Associative Memory for Transformers An AI-Driven Framework for Energy-Efficient Environmental Monitoring in Smart Cities Using Edge Intelligence LFRAG: Layout-oriented Fine-grained Retrieval-Augmented Generation on Multimodal Document Understanding CoReVAD: A Contextual Reasoning Framework for Training-Free Video Anomaly Detection The TIME Machine: On The Power of Motion for Efficient Perception Dithering Defense: Adversarial Robustness of Vision Foundation Models via Multi-Level Floyd-Steinberg Dithering CP or DP? Why Not Both: A Case Study in the Partial Shop Scheduling Problem OnePred: Next-Query Prediction via Recursive Intent Memory in Multi-Turn Conversations From Raw Experience to Skill Consumption: A Systematic Study of Model-Generated Agent Skills PrefBench: Evaluating Zero-Shot LLM Agents in Hidden-Preference Personalized Pricing Negotiations Robots That Know What to Ask: Recovering Misaligned Rewards through Targeted Explanations Lipschitz Optimization for Formal Verification of Homographies Expressive Power of Deep Homomorphism Networks over Relational Databases Autonomous Frontier-Based Exploration with VLM Guidance Transcoders Trace Visual Grounding and Hallucinations in Vision-Language Models Approximate Machine Unlearning through Manifold Representation Forgetting Guided by Self Mode Connectivity EvalVerse: Pipeline-Aware and Expert-Calibrated Benchmarking for Professional Cinematic Video Generation ChainFlow-VLA: Causal Flow Planning with Vision-Language Models Decomposing and Measuring Evaluation Awareness Uncovering the Latent Potential of Deep Intermediate Representations Multimodal Distribution Matching for Vision-Language Dataset Distillation Online Hand Gesture Recognition Using 3D Convolutional Neural Networks Agentic-VLA: Efficient Online Adaptation for Vision-Language-Action Models SkillOpt: Executive Strategy for Self-Evolving Agent Skills Strategic Coercion Within Alliances: The Greenland Sovereignty Game as an AI Stress Test Multi-Gate Residuals Test-Time Training Undermines Safety Guardrails Exploiting Longitudinal Context in Clinician-Verified Interactive Lesion Tracking Agentic Proving for Program Verification PhenoYieldNet: Learning Crop-Aware Phenological Responses for Multi-Crop Yield Prediction Coloring the Noise: Adversarial Sobolev Alignment for Faithful Image Super Resolution
Palette: A Modular, Controllable, and Efficient Framework for On-demand Authorized Safety Alignment Relaxation in LLMs
Qitao Tan, X · 2026-05-26 · via cs.AI updates on arXiv.org

View PDF HTML (experimental)

Abstract:Current safety alignment of foundation models largely follows a \emph{one-size-fits-all} paradigm, applying the same refusal policy across users and contexts. As a result, models may refuse requests that are unsafe for general users but legitimate for authorized professionals, limiting helpfulness in specialized professional settings. Existing approaches either require costly realignment or rely on inference-time steering that suffers from imprecise control and added latency. To this end, we propose \textsc{Palette}, a modular, controllable, and efficient framework that selectively relaxes refusal behavior on authorized target domains while preserving standard safety elsewhere. Our method identifies a refusal direction via multi-objective search and internalizes it into the model through lightweight adaptation. \textsc{Palette} further supports modular composition: it learns domain-specific safety controls independently and composes them through parameter merging, enabling on-demand multi-domain authorization without retraining. Experiments across four safety benchmarks, multiple model variants, and both LLMs and VLMs show that \textsc{Palette} delivers precise safety control without sacrificing general utility, offering a practical path toward foundation models that adapt to diverse professional needs.
Subjects: Artificial Intelligence (cs.AI); Software Engineering (cs.SE)
Cite as: arXiv:2605.24154 [cs.AI]
  (or arXiv:2605.24154v1 [cs.AI] for this version)
  https://doi.org/10.48550/arXiv.2605.24154

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Qitao Tan [view email]
[v1] Fri, 22 May 2026 19:22:17 UTC (16,759 KB)