





















Abstract:We develop new experimental paradigms for measuring welfare in language models. We compare verbal reports of models about their preferences with preferences expressed through behavior when navigating a virtual environment and selecting conversation topics. We also test how costs and rewards affect behavior and whether responses to an eudaimonic welfare scale - measuring states such as autonomy and purpose in life - are stable across semantically equivalent prompts. Overall, we observed a notable degree of mutual support between our measures. The reliable correlations observed between stated preferences and behavior across conditions suggest that preference satisfaction can, in principle, serve as an empirically measurable welfare proxy in some of today's AI systems. Furthermore, our design offered an illuminating setting for qualitative observation of model behavior. Yet, the consistency between measures was more pronounced in some models and conditions than others and responses were changed by perturbations. Due to this, and the background uncertainty about the nature of welfare and the cognitive states (and welfare subjecthood) of language models, we are currently uncertain whether our methods successfully measure the welfare state of language models. Nevertheless, these findings highlight the feasibility of welfare measurement in language models, inviting further exploration.
| Comments: | Forthcoming in Philosophy and the Mind Sciences (PhiMiSci) |
| Subjects: | Artificial Intelligence (cs.AI) |
| Cite as: | arXiv:2509.07961 [cs.AI] |
| (or arXiv:2509.07961v2 [cs.AI] for this version) | |
| https://doi.org/10.48550/arXiv.2509.07961 arXiv-issued DOI via DataCite |
From: Valen Tagliabue [view email]
[v1]
Tue, 9 Sep 2025 17:48:44 UTC (668 KB)
[v2]
Sat, 23 May 2026 11:23:56 UTC (747 KB)
此内容由惯性聚合(RSS阅读器)自动聚合整理,仅供阅读参考。 原文来自 — 版权归原作者所有。