





















Abstract:Large language models are increasingly used as behavioral simulators, but it remains unclear when their outputs reflect human-like cognitive mechanisms rather than prompt-sensitive surface patterns. We study this question through the realization effect, a well-characterized finding in behavioral economics in which risk-taking differs systematically after paper versus realized gains and losses. We evaluate LLM behavior at three levels: prompt-only behavioral sensitivity, linear readout of internal representations, and causal control via activation steering. Prompt-only results show systematic condition sensitivity, but the directional pattern does not reproduce human realization-effect predictions. Gemma's residual stream contains a linearly decodable realization-status signal at layer 18 that generalizes to held-out prompts. Steering along this direction does not, however, reliably shift downstream risk choices, a null result that holds across positive scales and in a negative sign-symmetry run. Behavioral sensitivity, latent readout, and causal control are three distinct properties that do not automatically co-occur, and successful latent readout is insufficient evidence that a model behaviorally relies on a representation during downstream decision-making.
| Subjects: | Artificial Intelligence (cs.AI); Computational Engineering, Finance, and Science (cs.CE) |
| Cite as: | arXiv:2605.25151 [cs.AI] |
| (or arXiv:2605.25151v1 [cs.AI] for this version) | |
| https://doi.org/10.48550/arXiv.2605.25151 arXiv-issued DOI via DataCite (pending registration) |
From: Emilio Barkett [view email]
[v1]
Sun, 24 May 2026 16:07:34 UTC (95 KB)
此内容由惯性聚合(RSS阅读器)自动聚合整理,仅供阅读参考。 原文来自 — 版权归原作者所有。