





















Abstract:High-confidence errors in large language models are often treated as fragile failures. We study an alternative: some errors may be false fixed points, locally stable, internally coherent, and confidently wrong. This separates robustness from truth-tracking. We develop the separation through a Kantian commitment-gate framing and a minimal linear feedback model in which stability and correctness can diverge. Across three open-weight models, overconfident wrong items are not systematically more locally fragile than confidently correct items under our hidden-state sensitivity probes. Abstention-aware self-critique reduces overconfident wrong commitments by sacrificing coverage, and C3-R, a rule-based explicit feedback gate, sharpens that tradeoff rather than eliminating it. These results motivate, but do not establish, high signal-to-noise (high-SNR) inertia and representational compression as possible mechanisms for stable miscalibration.
| Comments: | 27 pages, 8 figures, v3.0 |
| Subjects: | Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG) |
| Cite as: | arXiv:2510.14925 [cs.AI] |
| (or arXiv:2510.14925v4 [cs.AI] for this version) | |
| https://doi.org/10.48550/arXiv.2510.14925 arXiv-issued DOI via DataCite |
From: Akira Okutomi [view email]
[v1]
Thu, 16 Oct 2025 17:40:28 UTC (149 KB)
[v2]
Mon, 3 Nov 2025 12:53:06 UTC (158 KB)
[v3]
Sun, 14 Dec 2025 11:13:00 UTC (825 KB)
[v4]
Sat, 23 May 2026 12:15:45 UTC (1,397 KB)
此内容由惯性聚合(RSS阅读器)自动聚合整理,仅供阅读参考。 原文来自 — 版权归原作者所有。