惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

F
Full Disclosure
Recorded Future
Recorded Future
T
Tenable Blog
S
Securelist
C
CERT Recently Published Vulnerability Notes
T
Threatpost
S
Schneier on Security
A
Arctic Wolf
The Hacker News
The Hacker News
C
CXSECURITY Database RSS Feed - CXSecurity.com
Know Your Adversary
Know Your Adversary
P
Privacy International News Feed
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
The Register - Security
The Register - Security
Cisco Talos Blog
Cisco Talos Blog
AWS News Blog
AWS News Blog
K
Kaspersky official blog
T
True Tiger Recordings
T
Threat Research - Cisco Blogs
V
Vulnerabilities – Threatpost
P
Palo Alto Networks Blog
T
The Exploit Database - CXSecurity.com
小众软件
小众软件
B
Blog
Cyber Security Advisories - MS-ISAC
Cyber Security Advisories - MS-ISAC
Microsoft Azure Blog
Microsoft Azure Blog
Cyberwarzone
Cyberwarzone
C
Cybersecurity and Infrastructure Security Agency CISA
T
Tor Project blog
Spread Privacy
Spread Privacy
Malwarebytes
Malwarebytes
P
Proofpoint News Feed
F
Fox-IT International blog
F
Fortinet All Blogs
P
Privacy & Cybersecurity Law Blog
G
GRAHAM CLULEY
量子位
Latest news
Latest news
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
博客园 - 叶小钗
Project Zero
Project Zero
T
Tailwind CSS Blog
N
Netflix TechBlog - Medium
Martin Fowler
Martin Fowler
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
I
Intezer
博客园_首页
腾讯CDC
H
Hackread – Cybersecurity News, Data Breaches, AI and More
D
Darknet – Hacking Tools, Hacker News & Cyber Security

cs.AI updates on arXiv.org

Rethinking Weakly-supervised Video Temporal Grounding From a Game Perspective MedVol-R1: Reward-Driven Evidence Grounding for Volumetric Reasoning Segmentation CmIVTP: Cross-modal Interaction-based Vessel Trajectory Prediction for Maritime Intelligence Self-Improvement Imitation with Biologically Guided Search for Protein Design Under Oracle Budgets "PhyWorldBench": A Comprehensive Evaluation of Physical Realism in Text-to-Video Models Diffuse to Detect: Generative Diffusion Models for Unsupervised IC Anomaly Detection When Correct Demonstrations Hurt: Rethinking the Role of Exemplars in In-Context Learning Falcon-X: A Time Series Foundation Model for Heterogeneous Multivariate Modeling The Strongest Teacher Is Not Always the Best Teacher: Student-Centric Answer Selection Max-Window Scale Estimation for Near-Lossless HiF8 W8A8 Quantization-Aware Training Trust Region Q Adjoint Matching Personalized Generative Models for Contextual Debiasing E$^3$C: Video Generation with 3D Environmental Memory and Ego-Exo Human Pose Control Black-box Membership Inference Attacks on the Pre-training Data of Image-generation Models DynFrame: Adaptive Reasoning-Driven Multimodal Framework with Dynamic Frame Augmentation for Complex Video Understanding Rotation-Invariant Spherical Watermarking via Third-Order SO(3) Representation Coupling Adversarial Training for Robust Coverage Network under Worst-case Facility Losses ReMoE: Boosting Expert Reuse through Router Fine-Tuning in Memory-Constrained MoE LLM Inference Spend Your Rollouts Where It Counts: Rollout Allocation for Group-Based RL Post-Training Measuring Prediction Uncertainty in Neural Cellular Automata Co-folding model guided by structural proteomics Innovation: An Almost Characterization of Hallucination Self-Cascaded Diffusion Models for Arbitrary-Scale Image Super-Resolution Committed SAE-Feature Traces for Audited-Session Substitution Detection in Hosted LLMs Few-shot Cross-country Generalization of Tabular Machine Learning and Foundation Models for Childhood Anemia Prediction under Distribution Shift More Expressive Feedforward Layers: Part I. Token-Adaptive Mixing of Activations When Does Deep RL Beat Calibrated Baselines? A Benchmark Study on Adaptive Resource Control Recursive Flow Matching Planning Neural Dynamics with Lie Group Embedding through Supervised Projective Manifold Learning Aperiodic and Low-Frequency Spectral Bias in Reconstruction based EEG Foundation Models Timestep-Aware SVDQuant-GPTQ for W4A4 Quantization of Wan2.2-I2V The Rescue Effect: Spatio-Semantic Early Exit Bypasses Quantization Collapse in CLIP Towards Generalization-Oriented Models for Vehicle Routing Problems with Mixture-of-Experts Periodic Topological Deep Learning for Polymer Design and Discovery CSV-ViT: A Vision Transformer with the Variable-sized Cortical Supervertices for Detection of Alzheimer's Disease Pathologies EEG-FM-Audit: A Systematic Evaluation and Analysis Pipeline for EEG Foundation Models Pretraining Data Exposure in Large Language Models: A Survey of Membership Inference, Data Contamination, and Security Implications HRVConformer: Neonatal Hypoxic-Ischemic Encephalopathy Classification from the Heart Rate signals High-Quality Synthetic Financial Time-Series using a GAN-Diffusion Framework Adaptive Multi-prompt Contrastive Network for Few-shot Out-of-distribution Detection Semigroup Consistency as a Diagnostic for Learned Physics Simulators When Does Adaptive Guidance Help? Belief-Aware Privileged Distillation for Autonomous Driving Under Partial Observability Quantized Keys Steal Attention: Bias Correction for KV-Cache Compression in Video Diffusion GAC: Noise-Aware Adaptive Mixing for Hybrid SFT-RL Post-Training DEI: Diversity in Evolutionary Inference for Quality-Diversity Search Stabilizing Recurrent Dynamics for Test-Time Scalable Latent Reasoning in Looped Language Models GEM: Geometric Entropy Mixing for Optimal LLM Data Curation When Eyes Betray AI: Social Gaze Consistency as a Semantic Cue for AI-Generated Image Detection Linear and Neural Dueling Bandits with Delayed Feedback TSFMAudit: Data Contamination Auditing in Forecasting Time Series Foundation Models Generative Animations: A Multi-Model Pipeline for Prompt-Driven Motion Synthesis Curriculum Learning for Safety Alignment Unified Panoramic Geometry Estimation via Multi-View Foundation Models On the Push-Based Asynchronous Federated Learning: A Bias-Correction Aggregation Approach Model Merging on Loss Landscape: A Geometry Perspective InterSketch: An Interleaved Reasoning Model with Self-correcting Visual Sketch and Stepwise Reward Geometry-Aware Contrastive Learning for Few-Shot Automatic Modulation Recognition On the Error-Correcting Effects of Stochasticity in Discrete Diffusion Respecting Modality Gap in Post-hoc Out-of-distribution Detection with Pre-trained Vision-Language Models JetViT: Efficient High-Resolution Vision Transformer with Post-Training Attention Search Semantic Robustness Probing via Inpainting: An Interactive Tool for Safety-Critical Object Detection SL-BiLEM: Structured Learnable Behavior-in-the-Loop Epidemic Modeling for Forecasting and Policy Evaluation Dense2MoE: Pushing the Pareto Frontier of On-Device LLMs via Unified Pruning and Upcycling Ratio-Variance Regularized Policy Optimization AssetGen: Deployable 3D Asset Generation at Interactive Speed AnchorDiff: Training-Free Concept Grounding for MM-DiTs via Anchor-Based Graph Propagation Negligible in Size, Significant in Effect: On Scale Vectors in Large Language Models Two Speeds of Learning: A Representation-Readout Decomposition of Grokking and Double Descent LocateAnything: Fast and High-Quality Vision-Language Grounding with Parallel Box Decoding Less is More: Early Stopping Rollout for On-Policy Distillation The Kalman Evolve: Closing the Gap in Kalman Filtering via Interpretable Algorithm Discovery Cross-scale Aligned Supervision for Training GANs Hands-On: Segmenting Individual Signs from Continuous Sequences EdgeFlow: Edge-Map Augmented VLM-Based Flowchart Processing for Industrial Requirements Engineering LUCoS: Latent Unsupervised Context Selection for Tabular Foundation Models Deep-layer limit and stability analysis of the basic forward-backward-splitting induced network (II): learning problems Guiding LLM Post-training Data Engineering with Model Internals from Sparse Autoencoders InfoQuant: Shaping Activation Distributions for Low-Bit LLM Quantization Explainable Cross-Disease Reasoning for Cardiovascular Risk Assessment from Low-Dose Computed Tomography ReCA: Multi-Shot Long Video Extrapolation via Recursive Context Allocation A Hybrid Vision-Language Architecture for Automated Defect Reasoning and Report Generation in Industrial Inspection MobileMoE: Scaling On-Device Mixture of Experts Modeling Dynamic Mixtures of Time-Delay Systems from Streaming Time Series DDGAD: Trajectory Dynamics for Diffusion-Based Graph Anomaly Detection Erased but Exploitable: Black-box Embedding-Aware Prompting Against Unlearned Text-to-Image Diffusion Models Aligning Few-Step Generative Models by Amortizing Sample-based Variational Inference Unified Neural Scaling Laws BioFact-MoE: Biologically Factorized Mixture of Experts for Vision-Language Prognostic Modeling in Hepatocellular Carcinoma Beyond Trajectory-Level Attribution: Graph-Based Credit Assignment for Agentic Reinforcement Learning Comparative Study of Vision-Based Metric Measurement for Large-Scale Planar Scenes Bilevel Optimization over Saddle Points of Zero-Sum Markov Games Bridging Classification and Reconstruction: Cooperative Time Series Anomaly Detection Inference-Time Search Using Side Information for Diffusion-Based Image Reconstruction Unveiling the Fragility of Vision-Language Models: Multi-Modal Adversarial Synergy via Texture-Constrained Perturbations and Cross-Modal Optimization Scalable GANs with Transformers Auditing and Fixing Economic Validity in Tabular Foundation Models for Discrete Choice VisualNeedle: Benchmarking Active Visual Search in Information-Dense Scenes FoundObj: Self-supervised Foundation Models as Rewards for Label-free 3D Object Segmentation VesselSim: learning 3D blood vessel segmentation without expert annotations VISTA: An End-to-End Benchmark for Visual Spec-to-Web-App Coding Agents
Beyond Linearity in Attention Projections: The Case for Nonlinear Queries
Marko Karbev · 2026-04-27 · via cs.AI updates on arXiv.org

View PDF HTML (experimental)

Abstract:Recent algebraic analysis shows that in decoder-only and encoder-only transformers, the Query projection $W_Q$ may be set to identity without noticeable performance deterioration. This is possible because attention depends on $X$ only through the products $XW_Q, XW_K, XW_V$, allowing basis transformations to be absorbed by adjacent layers and propagated through the network. We replace $W_Q \in \R^{d \times d}$ with a nonlinear residual of the form $Q(X) = X + f_\theta(X)$, where $f_\theta$ is a bottleneck MLP with $d^2 + O(d)$ parameters. The identity term anchors the nonlinearity to a known-good prior. Experiments on GPT-3 small style models show consistent improvement over the baseline ($2.40\%$ lower validation log-loss, $6.81\%$ lower perplexity), comfortably outperforming a model with 12.5\% more non-embedding parameters. These results motivate investigation at larger scales and across modalities.
Comments: Accepted at the ICLR 2026 GRaM workshop: this https URL
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Cite as: arXiv:2603.13381 [cs.LG]
  (or arXiv:2603.13381v3 [cs.LG] for this version)
  https://doi.org/10.48550/arXiv.2603.13381

arXiv-issued DOI via DataCite

Submission history

From: Marko Karbevski [view email]
[v1] Wed, 11 Mar 2026 03:13:10 UTC (70 KB)
[v2] Fri, 24 Apr 2026 15:48:35 UTC (62 KB)
[v3] Tue, 26 May 2026 02:11:34 UTC (68 KB)