





















Abstract:Aligning AI systems with organizational decision-making is typically framed as a single-target problem: make the model behave like the organization. We argue this framing obscures a deeper pluralistic challenge. We rely on a decision-policy capturing method to measure process alignment: whether an LLM weights information as the organization does, not merely whether it reaches the same conclusions. Applying this method to ECHR Article 6 decisions, process alignment strongly predicts output accuracy (r = 0.85, p < .001) and externalization substantially improves alignment for poorly-aligned models. Applying it to German consumer credit decisions, this relationship collapses (r = 0.15, p = .60): interventions produce inconsistent effects and the benchmark encodes potentially discriminatory historical patterns. This contrast is itself a pluralistic alignment finding: in contested domains, high process alignment is neither achievable via externalization nor unconditionally desirable. Output agreement alone cannot distinguish a model that has internalized an organizational policy from one that merely approximates its outcomes; process-level measurement is a necessary component of any pluralistic alignment evaluation.
| Comments: | Accepted to ICML 2026 Pluralistic Alignment Workshop |
| Subjects: | Artificial Intelligence (cs.AI) |
| Cite as: | arXiv:2605.25256 [cs.AI] |
| (or arXiv:2605.25256v1 [cs.AI] for this version) | |
| https://doi.org/10.48550/arXiv.2605.25256 arXiv-issued DOI via DataCite (pending registration) |
From: Emilio Barkett [view email]
[v1]
Sun, 24 May 2026 21:16:26 UTC (159 KB)
此内容由惯性聚合(RSS阅读器)自动聚合整理,仅供阅读参考。 原文来自 — 版权归原作者所有。