惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

月光博客
月光博客
T
Tor Project blog
美团技术团队
WordPress大学
WordPress大学
V
Visual Studio Blog
cs.AI updates on arXiv.org
cs.AI updates on arXiv.org
O
OpenAI News
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
W
WeLiveSecurity
酷 壳 – CoolShell
酷 壳 – CoolShell
Simon Willison's Weblog
Simon Willison's Weblog
S
Securelist
S
SegmentFault 最新的问题
博客园 - 聂微东
宝玉的分享
宝玉的分享
E
Exploit-DB.com RSS Feed
博客园 - 叶小钗
N
News and Events Feed by Topic
博客园 - 司徒正美
S
Security Archives - TechRepublic
www.infosecurity-magazine.com
www.infosecurity-magazine.com
Last Week in AI
Last Week in AI
小众软件
小众软件
K
Kaspersky official blog
T
Tailwind CSS Blog
Hugging Face - Blog
Hugging Face - Blog
Google DeepMind News
Google DeepMind News
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
博客园 - 三生石上(FineUI控件)
腾讯CDC
V
V2EX
Know Your Adversary
Know Your Adversary
cs.CL updates on arXiv.org
cs.CL updates on arXiv.org
博客园 - 【当耐特】
博客园 - Franky
Spread Privacy
Spread Privacy
T
Troy Hunt's Blog
量子位
Apple Machine Learning Research
Apple Machine Learning Research
阮一峰的网络日志
阮一峰的网络日志
大猫的无限游戏
大猫的无限游戏
T
Threat Research - Cisco Blogs
博客园_首页
J
Java Code Geeks
有赞技术团队
有赞技术团队
Help Net Security
Help Net Security
IT之家
IT之家
T
Threatpost

cs.LG updates on arXiv.org

Detection of Hate and Threat in Digital Forensics: A Case-Driven Multimodal Approach 3D-VCD: Hallucination Mitigation in 3D-LLM Embodied Agents through Visual Contrastive Decoding EfficientSign: An Attention-Enhanced Lightweight Architecture for Indian Sign Language Recognition Unified Multimodal Uncertain Inference SenBen: Sensitive Scene Graphs for Explainable Content Moderation Low-Data Supervised Adaptation Outperforms Prompting for Cloud Segmentation Under Domain Shift Leave My Images Alone: Preventing Multi-Modal Large Language Models from Analyzing Images via Visual Prompt Injection Detecting Diffusion-generated Images via Dynamic Assembly Forests FIRE-CIR: Fine-grained Reasoning for Composed Fashion Image Retrieval PhysInOne: Visual Physics Learning and Reasoning in One Suite Rays as Pixels: Learning A Joint Distribution of Videos and Camera Trajectories From Selection to Scheduling: Federated Geometry-Aware Correction Makes Exemplar Replay Work Better under Continual Dynamic Heterogeneity MedFormer-UR: Uncertainty-Routed Transformer for Medical Image Classification OmniPrism: Learning Disentangled Visual Concept for Image Generation CausalVAD: De-confounding End-to-End Autonomous Driving via Causal Intervention R3PM-Net: Real-time, Robust, Real-world Point Matching Network Needle in a Haystack: One-Class Representation Learning for Detecting Rare Malignant Cells in Computational Cytology Generative 3D Gaussian Splatting for Arbitrary-ResolutionAtmospheric Downscaling and Forecasting When & How to Write for Personalized Demand-aware Query Rewriting in Video Search Relational Visual Similarity HaloProbe: Bayesian Detection and Mitigation of Object Hallucinations in Vision-Language Models MolPaQ: Modular Quantum-Classical Patch Learning for Interpretable Molecular Generation Memory-Guided Trust-Region Bayesian Optimization (MG-TuRBO) for High Dimensions On the Spectral Geometry of Cross-Modal Representations: A Functional Map Diagnostic for Multimodal Alignment Act or Escalate? Evaluating Escalation Behavior in Automation with Language Models EngageTriBoost: Predictive Modeling of User Engagement in Digital Mental Health Intervention Using Explainable Machine Learning AlphaLab: Autonomous Multi-Agent Research Across Optimization Domains with Frontier LLMs Reservoir observer enhanced with residual calibration and attention mechanism Joint Interference Detection and Identification via Adversarial Multi-task Learning Structured Exploration and Exploitation of Label Functions for Automated Data Annotation StructRL: Recovering Dynamic Programming Structure from Learning Dynamics in Distributional Reinforcement Learning Multivariate Time Series Anomaly Detection via Dual-Branch Reconstruction and Autoregressive Flow-based Residual Density Estimation Evidential Transformation Network: Turning Pretrained Models into Evidential Models for Post-hoc Uncertainty Estimation Creator Incentives in Recommender Systems: A Cooperative Game-Theoretic Approach for Stable and Fair Collaboration in Multi-Agent Bandits Skip-Connected Policy Optimization for Implicit Advantage From Dispersion to Attraction: Spectral Dynamics of Hallucination Across Whisper Model Scales Efficient RL Training for LLMs with Experience Replay Every Response Counts: Quantifying Uncertainty of LLM-based Multi-Agent Systems through Tensor Decomposition Wireless Communication Enhanced Value Decomposition for Multi-Agent Reinforcement Learning A Little Rank Goes a Long Way: Random Scaffolds with LoRA Adapters Are All You Need Adversarial Sensor Errors for Safe and Robust Wind Turbine Fleet Control PRAGMA: Revolut Foundation Model IKKA: Inversion Classification via Critical Anomalies for Robust Visual Servoing Adaptive Simulation Experiment for LLM Policy Optimization $p1$: Better Prompt Optimization with Fewer Prompts EvoLen: Evolution-Guided Tokenization for DNA Language Model Smartwatch-Based Sitting Time Estimation in Real-World Office Settings Structural Evaluation Metrics for SVG Generation via Leave-One-Out Analysis Loom: A Scalable Analytical Neural Computer Architecture Post-Hoc Guidance for Consistency Models by Joint Flow Distribution Learning Hierarchical Kernel Transformer: Multi-Scale Attention with an Information-Theoretic Approximation Analysis Spectral Geometry of LoRA Adapters Encodes Training Objective and Predicts Harmful Compliance Dictionary-Aligned Concept Control for Safeguarding Multimodal LLMs Finite-Sample Analysis of Nonlinear Independent Component Analysis:Sample Complexity and Identifiability Bounds Temporal Dropout Risk in Learning Analytics: A Harmonized Survival Benchmark Across Dynamic and Early-Window Representations How does Chain of Thought decompose complex tasks? A Mathematical Framework for Temporal Modeling and Counterfactual Policy Simulation of Student Dropout Revisiting the Capacity Gap in Chain-of-Thought Distillation from a Practical Perspective Uncertainty-Aware Transformers: Conformal Prediction for Language Models Adaptive Candidate Point Thompson Sampling for High-Dimensional Bayesian Optimization Using Synthetic Data for Machine Learning-based Childhood Vaccination Prediction in Narok, Kenya Delve into the Applicability of Advanced Optimizers for Multi-Task Learning Bridging SFT and RL: Dynamic Policy Optimization for Robust Reasoning Multi-Agent Decision-Focused Learning via Value-Aware Sequential Communication Predictive Entropy Links Calibration and Paraphrase Sensitivity in Medical Vision-Language Models WOMBET: World Model-based Experience Transfer for Robust and Sample-efficient Reinforcement Learning Efficient Hierarchical Implicit Flow Q-learning for Offline Goal-conditioned Reinforcement Learning Modality-Aware Zero-Shot Pruning and Sparse Attention for Efficient Multimodal Edge Inference Neighbourhood Transformer: Switchable Attention for Monophily-Aware Graph Learning Identification and Anonymization of Named Entities in Unstructured Information Sources for Use in Social Engineering Detection The nextAI Solution to the NeurIPS 2023 LLM Efficiency Challenge PDE-regularized Dynamics-informed Diffusion with Uncertainty-aware Filtering for Long-Horizon Dynamics Feature-Label Modal Alignment for Robust Partial Multi-Label Learning SafeAdapt: Provably Safe Policy Updates in Deep Reinforcement Learning Integrated electro-optic attention nonlinearities for transformers Toward World Models for Epidemiology Semantic Intent Fragmentation: A Single-Shot Compositional Attack on Multi-Agent AI Pipelines Tracing the Chain: Deep Learning for Stepping-Stone Intrusion Detection Deep Learning-Based Tracking and Lineage Reconstruction of Ligament Breakup Policy-Aware Design of Large-Scale Factorial Experiments R2G: A Multi-View Circuit Graph Benchmark Suite from RTL to GDSII ASTRA: Adaptive Semantic Tree Reasoning Architecture for Complex Table Question Answering Hypergraph Neural Networks Accelerate MUS Enumeration Regime-Conditional Retrieval: Theory and a Transferable Router for Two-Hop QA Beyond Augmented-Action Surrogates for Multi-Expert Learning-to-Defer Continuous Orthogonal Mode Decomposition: Haptic Signal Prediction in Tactile Internet FIT-GNN: Faster Inference Time for GNNs that 'FIT' in Memory Using Coarsening STCast: Adaptive Boundary Alignment for Global and Regional Weather Forecasting On-the-Fly Adaptation to Quantization: Configuration-Aware LoRA for Efficient Fine-Tuning of Quantized LLMs Batch Distillation Data for Developing Machine Learning Anomaly Detection Methods Predicting Metabolic Dysfunction-Associated Steatotic Liver Disease using Machine Learning Methods: A Retrospective Cohort Study From Navigation to Refinement: Revealing the Two-Stage Nature of Flow-based Diffusion Models through Oracle Velocity Adaptive Tuning of Parameterized Traffic Controllers via Multi-Agent Reinforcement Learning Bandwidth-constrained Variational Message Encoding for Cooperative Multi-agent Reinforcement Learning An Adaptive Horizon-Aware Model Selection Framework for Demand Forecasting under Horizon-Induced Degradation SubQuad: Near-Quadratic-Free Structure Inference with Distribution-Balanced Objectives in Adaptive Receptor framework Reinforcement-aware Knowledge Distillation for LLM Reasoning Neural Two-Stage Stochastic Optimization for Solving Unit Commitment Problem Multi-agent Adaptive Mechanism Design Measurement-Consistent Langevin Corrector for Stabilizing Latent Diffusion Inverse Problem Solvers
From Mechanistic to Compositional Interpretability
Ward Gauderi · 2026-05-12 · via cs.LG updates on arXiv.org

View PDF HTML (experimental)

Abstract:Mechanistic interpretability aims to explain neural model behaviour by reverse-engineering learned computational structure into human-understandable components. Without a formal framework, however, mechanistic explanations cannot be objectively verified, compared, or composed. We introduce compositional interpretability, a category-theoretic framework grounded in the principles of compositionality and minimum description length. Compositional interpretations are pairs of syntactic and semantic mappings that must commute to enforce consistency between a model's decomposition and its observed behaviour. We deconstruct explanation quality into measures of faithfulness and complexity to cast interpretability as a constrained optimisation problem, and introduce compressive refinement to systematically restructure models into simpler parts without altering their function. Finally, we prove a parsimony criterion under which syntactic compression theoretically guarantees more concise, human-aligned explanations. Our framework situates prominent mechanistic methods as subclasses of refinement, and clarifies why their compressibility heuristics tend to align with human interpretability. Our work provides a measurable, optimisable foundation for automating the discovery and evaluation of mechanistic explanations.
Subjects: Machine Learning (cs.LG)
Cite as: arXiv:2605.08934 [cs.LG]
  (or arXiv:2605.08934v1 [cs.LG] for this version)
  https://doi.org/10.48550/arXiv.2605.08934

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Thomas Dooms [view email]
[v1] Sat, 9 May 2026 13:08:07 UTC (1,917 KB)