惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

cs.AI updates on arXiv.org
cs.AI updates on arXiv.org
Vercel News
Vercel News
C
Cybersecurity and Infrastructure Security Agency CISA
I
Intezer
cs.CV updates on arXiv.org
cs.CV updates on arXiv.org
Microsoft Azure Blog
Microsoft Azure Blog
Google Online Security Blog
Google Online Security Blog
V
V2EX - 技术
L
LangChain Blog
C
Comments on: Blog
B
Blog RSS Feed
H
Hacker News: Front Page
F
Fortinet All Blogs
SecWiki News
SecWiki News
Webroot Blog
Webroot Blog
P
Proofpoint News Feed
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
W
WeLiveSecurity
大猫的无限游戏
大猫的无限游戏
CTFtime.org: upcoming CTF events
CTFtime.org: upcoming CTF events
博客园_首页
C
Check Point Blog
P
Privacy & Cybersecurity Law Blog
小众软件
小众软件
T
The Blog of Author Tim Ferriss
Cyber Security Advisories - MS-ISAC
Cyber Security Advisories - MS-ISAC
Scott Helme
Scott Helme
博客园 - Franky
P
Privacy International News Feed
阮一峰的网络日志
阮一峰的网络日志
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
AWS News Blog
AWS News Blog
L
Lohrmann on Cybersecurity
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
酷 壳 – CoolShell
酷 壳 – CoolShell
C
Cyber Attacks, Cyber Crime and Cyber Security
V
V2EX
B
Blog
C
CERT Recently Published Vulnerability Notes
Hacker News: Ask HN
Hacker News: Ask HN
H
Hackread – Cybersecurity News, Data Breaches, AI and More
A
Arctic Wolf
AI
AI
The Register - Security
The Register - Security
人人都是产品经理
人人都是产品经理
TaoSecurity Blog
TaoSecurity Blog
Project Zero
Project Zero
S
Secure Thoughts
Spread Privacy
Spread Privacy
宝玉的分享
宝玉的分享

cs.LG updates on arXiv.org

Detection of Hate and Threat in Digital Forensics: A Case-Driven Multimodal Approach 3D-VCD: Hallucination Mitigation in 3D-LLM Embodied Agents through Visual Contrastive Decoding EfficientSign: An Attention-Enhanced Lightweight Architecture for Indian Sign Language Recognition Unified Multimodal Uncertain Inference SenBen: Sensitive Scene Graphs for Explainable Content Moderation Low-Data Supervised Adaptation Outperforms Prompting for Cloud Segmentation Under Domain Shift Leave My Images Alone: Preventing Multi-Modal Large Language Models from Analyzing Images via Visual Prompt Injection Detecting Diffusion-generated Images via Dynamic Assembly Forests FIRE-CIR: Fine-grained Reasoning for Composed Fashion Image Retrieval PhysInOne: Visual Physics Learning and Reasoning in One Suite Rays as Pixels: Learning A Joint Distribution of Videos and Camera Trajectories From Selection to Scheduling: Federated Geometry-Aware Correction Makes Exemplar Replay Work Better under Continual Dynamic Heterogeneity MedFormer-UR: Uncertainty-Routed Transformer for Medical Image Classification OmniPrism: Learning Disentangled Visual Concept for Image Generation CausalVAD: De-confounding End-to-End Autonomous Driving via Causal Intervention R3PM-Net: Real-time, Robust, Real-world Point Matching Network Needle in a Haystack: One-Class Representation Learning for Detecting Rare Malignant Cells in Computational Cytology Generative 3D Gaussian Splatting for Arbitrary-ResolutionAtmospheric Downscaling and Forecasting When & How to Write for Personalized Demand-aware Query Rewriting in Video Search Relational Visual Similarity HaloProbe: Bayesian Detection and Mitigation of Object Hallucinations in Vision-Language Models MolPaQ: Modular Quantum-Classical Patch Learning for Interpretable Molecular Generation Memory-Guided Trust-Region Bayesian Optimization (MG-TuRBO) for High Dimensions On the Spectral Geometry of Cross-Modal Representations: A Functional Map Diagnostic for Multimodal Alignment Act or Escalate? Evaluating Escalation Behavior in Automation with Language Models EngageTriBoost: Predictive Modeling of User Engagement in Digital Mental Health Intervention Using Explainable Machine Learning AlphaLab: Autonomous Multi-Agent Research Across Optimization Domains with Frontier LLMs Reservoir observer enhanced with residual calibration and attention mechanism Joint Interference Detection and Identification via Adversarial Multi-task Learning Structured Exploration and Exploitation of Label Functions for Automated Data Annotation StructRL: Recovering Dynamic Programming Structure from Learning Dynamics in Distributional Reinforcement Learning Multivariate Time Series Anomaly Detection via Dual-Branch Reconstruction and Autoregressive Flow-based Residual Density Estimation Evidential Transformation Network: Turning Pretrained Models into Evidential Models for Post-hoc Uncertainty Estimation Creator Incentives in Recommender Systems: A Cooperative Game-Theoretic Approach for Stable and Fair Collaboration in Multi-Agent Bandits Skip-Connected Policy Optimization for Implicit Advantage From Dispersion to Attraction: Spectral Dynamics of Hallucination Across Whisper Model Scales Efficient RL Training for LLMs with Experience Replay Every Response Counts: Quantifying Uncertainty of LLM-based Multi-Agent Systems through Tensor Decomposition Wireless Communication Enhanced Value Decomposition for Multi-Agent Reinforcement Learning A Little Rank Goes a Long Way: Random Scaffolds with LoRA Adapters Are All You Need Adversarial Sensor Errors for Safe and Robust Wind Turbine Fleet Control PRAGMA: Revolut Foundation Model IKKA: Inversion Classification via Critical Anomalies for Robust Visual Servoing Adaptive Simulation Experiment for LLM Policy Optimization $p1$: Better Prompt Optimization with Fewer Prompts EvoLen: Evolution-Guided Tokenization for DNA Language Model Smartwatch-Based Sitting Time Estimation in Real-World Office Settings Structural Evaluation Metrics for SVG Generation via Leave-One-Out Analysis Loom: A Scalable Analytical Neural Computer Architecture Post-Hoc Guidance for Consistency Models by Joint Flow Distribution Learning Hierarchical Kernel Transformer: Multi-Scale Attention with an Information-Theoretic Approximation Analysis Spectral Geometry of LoRA Adapters Encodes Training Objective and Predicts Harmful Compliance Dictionary-Aligned Concept Control for Safeguarding Multimodal LLMs Finite-Sample Analysis of Nonlinear Independent Component Analysis:Sample Complexity and Identifiability Bounds Temporal Dropout Risk in Learning Analytics: A Harmonized Survival Benchmark Across Dynamic and Early-Window Representations How does Chain of Thought decompose complex tasks? A Mathematical Framework for Temporal Modeling and Counterfactual Policy Simulation of Student Dropout Revisiting the Capacity Gap in Chain-of-Thought Distillation from a Practical Perspective Uncertainty-Aware Transformers: Conformal Prediction for Language Models Adaptive Candidate Point Thompson Sampling for High-Dimensional Bayesian Optimization Using Synthetic Data for Machine Learning-based Childhood Vaccination Prediction in Narok, Kenya Delve into the Applicability of Advanced Optimizers for Multi-Task Learning Bridging SFT and RL: Dynamic Policy Optimization for Robust Reasoning Multi-Agent Decision-Focused Learning via Value-Aware Sequential Communication Predictive Entropy Links Calibration and Paraphrase Sensitivity in Medical Vision-Language Models WOMBET: World Model-based Experience Transfer for Robust and Sample-efficient Reinforcement Learning Efficient Hierarchical Implicit Flow Q-learning for Offline Goal-conditioned Reinforcement Learning Modality-Aware Zero-Shot Pruning and Sparse Attention for Efficient Multimodal Edge Inference Neighbourhood Transformer: Switchable Attention for Monophily-Aware Graph Learning Identification and Anonymization of Named Entities in Unstructured Information Sources for Use in Social Engineering Detection The nextAI Solution to the NeurIPS 2023 LLM Efficiency Challenge PDE-regularized Dynamics-informed Diffusion with Uncertainty-aware Filtering for Long-Horizon Dynamics Feature-Label Modal Alignment for Robust Partial Multi-Label Learning SafeAdapt: Provably Safe Policy Updates in Deep Reinforcement Learning Integrated electro-optic attention nonlinearities for transformers Toward World Models for Epidemiology Semantic Intent Fragmentation: A Single-Shot Compositional Attack on Multi-Agent AI Pipelines Tracing the Chain: Deep Learning for Stepping-Stone Intrusion Detection Deep Learning-Based Tracking and Lineage Reconstruction of Ligament Breakup Policy-Aware Design of Large-Scale Factorial Experiments R2G: A Multi-View Circuit Graph Benchmark Suite from RTL to GDSII ASTRA: Adaptive Semantic Tree Reasoning Architecture for Complex Table Question Answering Hypergraph Neural Networks Accelerate MUS Enumeration Regime-Conditional Retrieval: Theory and a Transferable Router for Two-Hop QA Beyond Augmented-Action Surrogates for Multi-Expert Learning-to-Defer Continuous Orthogonal Mode Decomposition: Haptic Signal Prediction in Tactile Internet FIT-GNN: Faster Inference Time for GNNs that 'FIT' in Memory Using Coarsening STCast: Adaptive Boundary Alignment for Global and Regional Weather Forecasting On-the-Fly Adaptation to Quantization: Configuration-Aware LoRA for Efficient Fine-Tuning of Quantized LLMs Batch Distillation Data for Developing Machine Learning Anomaly Detection Methods Predicting Metabolic Dysfunction-Associated Steatotic Liver Disease using Machine Learning Methods: A Retrospective Cohort Study From Navigation to Refinement: Revealing the Two-Stage Nature of Flow-based Diffusion Models through Oracle Velocity Adaptive Tuning of Parameterized Traffic Controllers via Multi-Agent Reinforcement Learning Bandwidth-constrained Variational Message Encoding for Cooperative Multi-agent Reinforcement Learning An Adaptive Horizon-Aware Model Selection Framework for Demand Forecasting under Horizon-Induced Degradation SubQuad: Near-Quadratic-Free Structure Inference with Distribution-Balanced Objectives in Adaptive Receptor framework Reinforcement-aware Knowledge Distillation for LLM Reasoning Neural Two-Stage Stochastic Optimization for Solving Unit Commitment Problem Multi-agent Adaptive Mechanism Design Measurement-Consistent Langevin Corrector for Stabilizing Latent Diffusion Inverse Problem Solvers
Towards Steering without Sacrifice: Principled Training of Steering Vectors for Prompt-only Interventions
Yuntai Bao, · 2026-05-08 · via cs.LG updates on arXiv.org

View PDF HTML (experimental)

Abstract:Recently, steering vectors (SVs) have emerged as an effective and lightweight approach to steer behaviors of large language models (LLMs), among which fine-tuned SVs are more effective than optimization-free ones. However, current approaches to fine-tuned SVs suffer from two limitations. First, they require careful selection of steering factors on a per-SV basis to balance steering effectiveness and generation quality at inference time. Second, they operate as full-sequence SVs (FSSVs), which can sacrifice generation quality regardless of factor selection due to excessive intervention on the model generation process. To address the first limitation, we propose joint training of steering factors and directions, such that post-hoc factor selection is no longer required. Using neural network scaling theory, we find that moderately large initialization sizes and learning rates for steering factors are essential for stability and efficiency of joint training. To tackle the second limitation, we draw inspiration from representation fine-tuning and introduce Prompt-only SV (PrOSV), an SV that intervenes only on a few prompt tokens. Our empirical results show that PrOSV outperforms traditional FSSVs on AxBench when using our joint training scheme. We also find that PrOSV achieves a better tradeoff between general model utility and adversarial robustness than FSSV.
Comments: 63 pages, 50 figures; accepted by ICML 2026
Subjects: Machine Learning (cs.LG)
Cite as: arXiv:2605.05983 [cs.LG]
  (or arXiv:2605.05983v1 [cs.LG] for this version)
  https://doi.org/10.48550/arXiv.2605.05983

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Yuntai Bao [view email]
[v1] Thu, 7 May 2026 10:31:12 UTC (1,565 KB)