
























Abstract:Despite significant advancements in XAI, scholars note a persistent lack of solid conceptual foundations and integration with broader scientific discourse on explanation. In response, emerging research draws on explanatory strategies from various sciences and the philosophy of science literature to fill these gaps. This paper outlines a mechanistic strategy for explaining the functional organization of deep learning systems, situating recent developments in explainable AI within a broader philosophical context. According to the mechanistic approach, the explanation of opaque AI systems involves identifying mechanisms that drive decision making. For deep neural networks, this means discerning functionally relevant components, such as neurons, layers, circuits, or activation patterns, and understanding their roles through decomposition, localization, and recomposition. Proof-of-principle case studies from image recognition and language modeling align these theoretical approaches with mechanistic interpretability research from OpenAI and Anthropic. The findings suggest that pursuing mechanistic explanations can uncover elements that traditional explainability techniques may overlook, ultimately contributing to more thoroughly explainable AI
| Subjects: | Machine Learning (cs.LG); Artificial Intelligence (cs.AI) |
| Cite as: | arXiv:2411.01332 [cs.LG] |
| (or arXiv:2411.01332v5 [cs.LG] for this version) | |
| https://doi.org/10.48550/arXiv.2411.01332 arXiv-issued DOI via DataCite |
|
| Related DOI: | https://doi.org/10.1007/978-3-032-10073-3_23
DOI(s) linking to related resources |
From: Marcin Rabiza [view email]
[v1]
Sat, 2 Nov 2024 18:30:32 UTC (577 KB)
[v2]
Thu, 16 Jan 2025 23:37:24 UTC (577 KB)
[v3]
Mon, 24 Mar 2025 03:51:49 UTC (544 KB)
[v4]
Tue, 25 Mar 2025 01:41:47 UTC (544 KB)
[v5]
Wed, 20 May 2026 18:16:00 UTC (604 KB)
此内容由惯性聚合(RSS阅读器)自动聚合整理,仅供阅读参考。 原文来自 — 版权归原作者所有。