The safe-to-dangerous shift is a fundamental problem for eval realism; but also for measuring awareness
Charlie Grif
·
2026-05-15
·
via AI Alignment Forum
1) The safe-to-dangerous shift is a fundamental problem for eval realism Suppose we have a capable and potent…
此内容由惯性聚合(RSS阅读器)自动聚合整理,仅供阅读参考。 原文来自 — 版权归原作者所有。