惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

S
Schneier on Security
有赞技术团队
有赞技术团队
T
The Blog of Author Tim Ferriss
F
Fortinet All Blogs
D
DataBreaches.Net
F
Full Disclosure
腾讯CDC
博客园 - 【当耐特】
MyScale Blog
MyScale Blog
Stack Overflow Blog
Stack Overflow Blog
小众软件
小众软件
Hugging Face - Blog
Hugging Face - Blog
Last Week in AI
Last Week in AI
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
爱范儿
爱范儿
The GitHub Blog
The GitHub Blog
Engineering at Meta
Engineering at Meta
大猫的无限游戏
大猫的无限游戏
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
S
SegmentFault 最新的问题
The Register - Security
The Register - Security
WordPress大学
WordPress大学
博客园 - 聂微东
雷峰网
雷峰网
J
Java Code Geeks
Exploit-DB.com RSS Feed
Exploit-DB.com RSS Feed
P
Privacy International News Feed
酷 壳 – CoolShell
酷 壳 – CoolShell
A
Arctic Wolf
Scott Helme
Scott Helme
C
Cyber Attacks, Cyber Crime and Cyber Security
T
Tor Project blog
博客园 - 三生石上(FineUI控件)
Know Your Adversary
Know Your Adversary
AWS News Blog
AWS News Blog
G
Google Developers Blog
www.infosecurity-magazine.com
www.infosecurity-magazine.com
C
CERT Recently Published Vulnerability Notes
O
OpenAI News
Project Zero
Project Zero
K
KPMG report finds enterprise disconnect between AI and its ROI | CIO
Application and Cybersecurity Blog
Application and Cybersecurity Blog
云风的 BLOG
云风的 BLOG
N
News and Events Feed by Topic
MongoDB | Blog
MongoDB | Blog
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
Microsoft Security Blog
Microsoft Security Blog
Cisco Talos Blog
Cisco Talos Blog
P
Palo Alto Networks Blog
Schneier on Security
Schneier on Security

AI Alignment Forum

Deployment Awareness Matters More Than Evaluation Awareness — AI Alignment Forum The Case for Model Forensics — AI Alignment Forum LLM-Driven Feature Discovery — AI Alignment Forum How transparent is DiffusionGemma (and why it matters) — AI Alignment Forum GDM AI Control Roadmap — AI Alignment Forum Predicting LLM Safety Before Release by Simulating Deployment — AI Alignment Forum Synthetic document finetuning for instilling positive traits — AI Alignment Forum Why Do Naive SFT Filters For Safety Properties Fail? — AI Alignment Forum SFT Drives Gemini’s Safety Properties — AI Alignment Forum Sympathy for both sides of the egregious misalignment debate — AI Alignment Forum Models May Behave Worse When Eval Aware — AI Alignment Forum Sequent: scale and automation for higher confidence in alignment — AI Alignment Forum Tracing Eval-Awareness Emergence Through Training of OLMo 3 — AI Alignment Forum A Mike's-Eye View of ARC's Research — AI Alignment Forum Efficient tradeoffs and the safety-usefulness tradeoff model — AI Alignment Forum Can activation verbalizers surface an internal chain of thought? — AI Alignment Forum My research: a computational cognitive neuroscience perspective on alignment — AI Alignment Forum Announcing the ARC White-Box Estimation Challenge — AI Alignment Forum Testing Gemini models for scheming tendencies Advice for making robust-to-training model organisms Eval Cooperativeness May Be a Scalable Mitigation for Eval Gaming Full automation of AI R&D probably yields a large speed up even without a software-only singularity Looking for backdoors in Jane Street LLMs The Case for Evaluating Model Behaviors Risk reports need to address deployment-time spread of misalignment Mechanistic estimation for expectations of random products The safe-to-dangerous shift is a fundamental problem for eval realism; but also for measuring awareness Empowerment, corrigibility, etc. are simple abstractions (of a messed-up ontology)
Building and evaluating model diffing agents — AI Alignment Forum
bilalchughta · 2026-06-13 · via AI Alignment Forum

x