惯性聚合
高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文
在惯性聚合中打开
即将跳转到惯性聚合
3
在聚合应用中查看完整内容和互动
立即跳转
取消
推荐订阅源
S
Schneier on Security
有赞技术团队
T
The Blog of Author Tim Ferriss
F
Fortinet All Blogs
D
DataBreaches.Net
F
Full Disclosure
腾
腾讯CDC
博
博客园 - 【当耐特】
MyScale Blog
Stack Overflow Blog
小众软件
Hugging Face - Blog
Last Week in AI
OSCHINA 社区最新新闻
爱范儿
The GitHub Blog
Engineering at Meta
大猫的无限游戏
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
S
SegmentFault 最新的问题
The Register - Security
WordPress大学
博
博客园 - 聂微东
雷峰网
J
Java Code Geeks
Exploit-DB.com RSS Feed
P
Privacy International News Feed
酷 壳 – CoolShell
A
Arctic Wolf
Scott Helme
C
Cyber Attacks, Cyber Crime and Cyber Security
T
Tor Project blog
博
博客园 - 三生石上(FineUI控件)
Know Your Adversary
AWS News Blog
G
Google Developers Blog
www.infosecurity-magazine.com
C
CERT Recently Published Vulnerability Notes
O
OpenAI News
Project Zero
K
KPMG report finds enterprise disconnect between AI and its ROI | CIO
Application and Cybersecurity Blog
云风的 BLOG
N
News and Events Feed by Topic
MongoDB | Blog
让小产品的独立变现更简单 - ezindie.com
Microsoft Security Blog
Cisco Talos Blog
P
Palo Alto Networks Blog
Schneier on Security
AI Alignment Forum
Deployment Awareness Matters More Than Evaluation Awareness — AI Alignment Forum
The Case for Model Forensics — AI Alignment Forum
LLM-Driven Feature Discovery — AI Alignment Forum
How transparent is DiffusionGemma (and why it matters) — AI Alignment Forum
GDM AI Control Roadmap — AI Alignment Forum
Predicting LLM Safety Before Release by Simulating Deployment — AI Alignment Forum
Synthetic document finetuning for instilling positive traits — AI Alignment Forum
Why Do Naive SFT Filters For Safety Properties Fail? — AI Alignment Forum
SFT Drives Gemini’s Safety Properties — AI Alignment Forum
Sympathy for both sides of the egregious misalignment debate — AI Alignment Forum
Models May Behave Worse When Eval Aware — AI Alignment Forum
Sequent: scale and automation for higher confidence in alignment — AI Alignment Forum
Tracing Eval-Awareness Emergence Through Training of OLMo 3 — AI Alignment Forum
A Mike's-Eye View of ARC's Research — AI Alignment Forum
Efficient tradeoffs and the safety-usefulness tradeoff model — AI Alignment Forum
Can activation verbalizers surface an internal chain of thought? — AI Alignment Forum
My research: a computational cognitive neuroscience perspective on alignment — AI Alignment Forum
Announcing the ARC White-Box Estimation Challenge — AI Alignment Forum
Testing Gemini models for scheming tendencies
Advice for making robust-to-training model organisms
Eval Cooperativeness May Be a Scalable Mitigation for Eval Gaming
Full automation of AI R&D probably yields a large speed up even without a software-only singularity
Looking for backdoors in Jane Street LLMs
The Case for Evaluating Model Behaviors
Risk reports need to address deployment-time spread of misalignment
Mechanistic estimation for expectations of random products
The safe-to-dangerous shift is a fundamental problem for eval realism; but also for measuring awareness
Empowerment, corrigibility, etc. are simple abstractions (of a messed-up ontology)
Building and evaluating model diffing agents — AI Alignment Forum
bilalchughta
·
2026-06-13
·
via
AI Alignment Forum
x
此内容由惯性聚合(RSS阅读器)自动聚合整理,仅供阅读参考。
原文来自
— 版权归原作者所有。