惯性聚合
高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文
在惯性聚合中打开
即将跳转到惯性聚合
3
在聚合应用中查看完整内容和互动
立即跳转
取消
推荐订阅源
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
G
GRAHAM CLULEY
P
Privacy & Cybersecurity Law Blog
Threat Intelligence Blog | Flashpoint
宝玉的分享
P
Proofpoint News Feed
H
Help Net Security
V
Visual Studio Blog
阮一峰的网络日志
C
Cisco Blogs
人人都是产品经理
Know Your Adversary
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
Recorded Future
I
Intezer
罗
罗磊的独立博客
T
The Exploit Database - CXSecurity.com
Blog — PlanetScale
Malwarebytes
Spread Privacy
T
Tor Project blog
V
Vulnerabilities – Threatpost
云风的 BLOG
腾
腾讯CDC
B
Blog RSS Feed
Stack Overflow Blog
F
Future of Privacy Forum
MyScale Blog
Latest news
IT之家
MongoDB | Blog
The Hacker News
S
Securelist
博
博客园 - 【当耐特】
C
CXSECURITY Database RSS Feed - CXSecurity.com
T
Threat Research - Cisco Blogs
Jina AI
Cisco Talos Blog
B
Blog
博
博客园 - 三生石上(FineUI控件)
Last Week in AI
CTFtime.org: upcoming CTF events
M
MIT News - Artificial intelligence
V
V2EX
D
Darknet – Hacking Tools, Hacker News & Cyber Security
The Cloudflare Blog
The GitHub Blog
博
博客园 - 聂微东
F
Full Disclosure
C
CERT Recently Published Vulnerability Notes
LessWrong
Possible red is red
Apr-May 2026 AI Security via Formal Methods — LessWrong
An Introduction to Neo-Fatalism — LessWrong
Loss of Oversight: How AI Systems May Become Harder to Audit, Monitor, and Investigate
What am I, if not an AI? — LessWrong
AI #169: New Knowledge
Learned Chain-of-Thought Obfuscation Generalises to Unseen Tasks
Numb mental state shifts — LessWrong
Women should be able to open things — LessWrong
Why are people so scared of causing fear?
Document-tuning instills durable animal compassion in LLMs (and generalizes to humans)
What About Us?
The Whole Kitten-Cavoodle
Why does off-model SFT degrade capabilities? — LessWrong
If I Were Emperor of New AI Safety Researcher Training... — LessWrong
theory uplift differentially benefits safety & is underleveraged
Singular Learning Theory Comprehensive - 1 — LessWrong
Sparse Efficiency vs. Superposition: The Interpretability Tradeoff — LessWrong
The Case for Evaluating Model Behaviors
Toward Interoperability of Minimal Programs — LessWrong
Fundamental Uncertainty $2,000 Essay Contest — LessWrong
Check out my technological uplifting, civilization-building, and science in a magic world fiction!
Synthetic Persona Pretraining: Alignment from Token Zero — LessWrong
Give my children minds — LessWrong
Power-seeking agents will likely be developed — LessWrong
Apply now to Human-Aligned AI Summer School 2026 — LessWrong
From 8B to Frontier: How System Prompts Control Whether AI Agents Blackmail, Leak, and Kill — LessWrong
If AI is normal technology, history is not reassuring.
Pythagorean addition — LessWrong
So you don't want everybody to die — LessWrong
Temporal Proportional Representation
Conclave 1492
Childhood And Education #19: Letting Kids Be Kids #2 — LessWrong
Implications Of Predicting The Next Token
Housing Roundup #15: The War Against Renters
Leaving DCA to the North on Foot
A Visual Guide to Natural Latents — LessWrong
Humans are not automatically strategic — "inner work" edition
Cyborg Uplift Studies
We Need to Get Serious about Uplift Studies
Brain Structure and IQ: How Myelin Elevates Intelligence
Sealing Conditional Misalignment in Inoculation Prompting with Consistency Training
Let's have more partial insiders.
Roadmap through AI safety programs for early-career technical researchers
Should Rationalists Looksmaxx?
When Fluency Is Free
AI emotions and aligned behavior
Tracking Difficulty with Feature Portfolios
Outsiders should focus on specs/constitutions
Outsiders should focus on specs/constitutions (among other things)
Logical Share Splitting for Intuitionists
Coordinal: A Postmortem.
Noticing Confusion: A practice in staying curious
Dating Roundup #12: Sex and Violence
Negation Neglect: When models fail to learn negations in training
So are you some kind of communist?
Thoughts on interviewing candidates for AI safety fellowships
PauseAI Munich Local Group Kickoff
Classifier Context Rot: Monitor Performance Degrades with Context Length
How useful is cross-domain generalization for training LLM monitors?
Jhana Quick Start Guide
Links #1: 2026/05 Part 1
why pollen allergies?
Why Physical Attractiveness Matters for Men's Dating Prospects
Bay Summer Solstice 2026
How to Quit Fandom: Apostasy
Engineering a Safer World: Risk Modelling — and Safety Engineering? — for AI Loss of Control
Next Token Prediction is a Misleading Term
Can ELK be brute-forced? Intertheoretic reduction
James C. Scott: Seeing Like a State
How to Reason about Your Health Issues
Are You Not Rationalists? — LessWrong
Falling for the statistical parrot — LessWrong
On getting unstuck — LessWrong
A relatively brief explanation of Boltzmann Brains — LessWrong
Benchmarking Real Work — LessWrong
Critique Systems, Not Reality
Trying to use NLAs to find out how Qwen 2.5 7B does multiplication — LessWrong
A Year Late, Claude Finally Beats Pokémon
NLA Verbalizations on AuditBench: Llama 70B — LessWrong
An Introduction to Exemplar Partitioning for Mechanistic Interpretability
An Argument for Analogies—Polymaths 1/3 — LessWrong
Incriminating misaligned AI models via distillation — LessWrong
Critical Thinking as a Gym Schedule
Why I am not too worried about AIpocalypse: Scott Alexander vs Nicolaus Copernicus — LessWrong
Risk reports need to address deployment-time spread of misalignment — LessWrong
Monthly Roundup #42: May 2026
Mechanistic estimation for expectations of random products
Clarifying the Darwinian Honeymoon — LessWrong
Announcing the Center for Shared AI Prosperity — LessWrong
MATS 9 Retrospective & Advice — LessWrong
Data Quality is Way Underrated, and We Should Start Funding It.
Don’t be too Clever to Take Obvious Advice
Some observations about NLA explanations
The hard core of alignment (is robustifying RL)
Convergent Abstraction Hypothesis
Emma Baker on ADHD
Designing AI factual claims for "easy verification"
Automated Alignment is Harder Than You Think
2B scoring model flags out-of-domain misalignment, suggesting specialist judges have potential for audits
Moderator's Principle of Least Surprise
Czynski
·
2026-05-22
·
via
LessWrong
There is a rule in the design of software and user interfaces called the Principle of Least Surprise. It says…
此内容由惯性聚合(RSS阅读器)自动聚合整理,仅供阅读参考。
原文来自
— 版权归原作者所有。