惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

IT之家
IT之家
Martin Fowler
Martin Fowler
博客园 - 【当耐特】
博客园_首页
博客园 - 三生石上(FineUI控件)
L
LangChain Blog
GbyAI
GbyAI
H
Help Net Security
酷 壳 – CoolShell
酷 壳 – CoolShell
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
阮一峰的网络日志
阮一峰的网络日志
The GitHub Blog
The GitHub Blog
D
DataBreaches.Net
美团技术团队
大猫的无限游戏
大猫的无限游戏
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
V
V2EX
有赞技术团队
有赞技术团队
MyScale Blog
MyScale Blog
月光博客
月光博客
Stack Overflow Blog
Stack Overflow Blog
Stack Overflow Blog
Stack Overflow Blog
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
Last Week in AI
Last Week in AI
博客园 - 司徒正美
H
Hackread – Cybersecurity News, Data Breaches, AI and More
T
The Blog of Author Tim Ferriss
WordPress大学
WordPress大学
Hugging Face - Blog
Hugging Face - Blog
博客园 - Franky
人人都是产品经理
人人都是产品经理
博客园 - 聂微东
雷峰网
雷峰网
Google DeepMind News
Google DeepMind News
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
Jina AI
Jina AI
B
Blog
V
Visual Studio Blog
B
Blog RSS Feed
Microsoft Security Blog
Microsoft Security Blog
AWS News Blog
AWS News Blog
L
LINUX DO - 最新话题
Know Your Adversary
Know Your Adversary
T
Tor Project blog
S
SegmentFault 最新的问题
F
Future of Privacy Forum
The Hacker News
The Hacker News
李成银的技术随笔
P
Proofpoint News Feed
T
True Tiger Recordings

LessWrong

Sentient Welfare Across Three Futures — LessWrong Linkpost: New Vatican Encyclical on AI Governance — LessWrong How AI Will Save Prediction Markets There should be a discussion about LW's policy to allow calls for violence Character-trained models can struggle to generalise — LessWrong Applications open for the Secure Program Synthesis Fellowship — LessWrong Announcing the Frontier Biodefense Fellowship (deadline 2 June) We Need Unhobbled Donors — LessWrong Taxing Small Cars To Improve MPG — LessWrong A (Slightly) Mechanistic Theory for Exponentially Increasing AI Time Horizons? Neurogastronomic Phenomenology for Advanced Beginners, Applied and Pure Heretical Pasta Veganism is Virtuous, not Obligatory — LessWrong Veganism is Virtuous but not Obligatory — LessWrong Low Expectancy is Not a Confidence Problem Basic principles for dressing better. Boltzmann brains, like Doomsday, require no explaining — LessWrong Probabilities are not the right concept — LessWrong Your Left Brain Doesn't Trade With Your Right — LessWrong The Fundamentals of Cogitism: Grounding Ethics in the Nature of Consciousness — LessWrong The Leaky AI Safety Pipeline — LessWrong Can Large Language Models Identify Novel Threats? Part 1: Mirror Life and the Classification Gap — LessWrong Capitalism is only the first of our problems — LessWrong A political movement will save us from extinction How should we update on AI-enabled coups post-Mythos? — LessWrong Out-of-Context Reasoning (OOCR) in LLMs: A Short Primer and Reading List Looking for backdoors in Jane Street LLMs PLA Daily Translation: Reflections on Warfare Brought by AGI — LessWrong Will we really put data centers in space? — LessWrong We made a map of the doom debate — LessWrong Which technical AI safety fields are going to be automated first? Gemini 3.5 Flash Looks Good For How Fast It Is — LessWrong The AI Industrial Explosion — Part 3: Going faster — LessWrong Strong Longtermism Is Simply Correct — LessWrong Notes on Collaborating with Claude Opus — LessWrong Proposal for "Timelines to what": DIAL distribution — LessWrong Insurance Premiums To The Moon AI is Not Normal Technology Counting Arguments in AI Safety — LessWrong You can opt out of allergies — LessWrong Moderator's Principle of Least Surprise Possible red is red Apr-May 2026 AI Security via Formal Methods — LessWrong An Introduction to Neo-Fatalism — LessWrong Loss of Oversight: How AI Systems May Become Harder to Audit, Monitor, and Investigate What am I, if not an AI? — LessWrong AI #169: New Knowledge Learned Chain-of-Thought Obfuscation Generalises to Unseen Tasks Numb mental state shifts — LessWrong Women should be able to open things — LessWrong Why are people so scared of causing fear? Document-tuning instills durable animal compassion in LLMs (and generalizes to humans) What About Us? The Whole Kitten-Cavoodle Why does off-model SFT degrade capabilities? — LessWrong If I Were Emperor of New AI Safety Researcher Training... — LessWrong theory uplift differentially benefits safety & is underleveraged Singular Learning Theory Comprehensive - 1 — LessWrong Sparse Efficiency vs. Superposition: The Interpretability Tradeoff — LessWrong The Case for Evaluating Model Behaviors Toward Interoperability of Minimal Programs — LessWrong Fundamental Uncertainty $2,000 Essay Contest — LessWrong Check out my technological uplifting, civilization-building, and science in a magic world fiction! Synthetic Persona Pretraining: Alignment from Token Zero — LessWrong Give my children minds — LessWrong Power-seeking agents will likely be developed — LessWrong Apply now to Human-Aligned AI Summer School 2026 — LessWrong From 8B to Frontier: How System Prompts Control Whether AI Agents Blackmail, Leak, and Kill — LessWrong If AI is normal technology, history is not reassuring. Pythagorean addition — LessWrong So you don't want everybody to die — LessWrong Temporal Proportional Representation Conclave 1492 Childhood And Education #19: Letting Kids Be Kids #2 — LessWrong Implications Of Predicting The Next Token Housing Roundup #15: The War Against Renters Leaving DCA to the North on Foot A Visual Guide to Natural Latents — LessWrong Humans are not automatically strategic — "inner work" edition Cyborg Uplift Studies We Need to Get Serious about Uplift Studies Brain Structure and IQ: How Myelin Elevates Intelligence Sealing Conditional Misalignment in Inoculation Prompting with Consistency Training Let's have more partial insiders. Roadmap through AI safety programs for early-career technical researchers Should Rationalists Looksmaxx? When Fluency Is Free AI emotions and aligned behavior Tracking Difficulty with Feature Portfolios Outsiders should focus on specs/constitutions Outsiders should focus on specs/constitutions (among other things) Logical Share Splitting for Intuitionists Coordinal: A Postmortem. Noticing Confusion: A practice in staying curious Dating Roundup #12: Sex and Violence Negation Neglect: When models fail to learn negations in training So are you some kind of communist? Thoughts on interviewing candidates for AI safety fellowships PauseAI Munich Local Group Kickoff Classifier Context Rot: Monitor Performance Degrades with Context Length
Cognitive Security as an AI Safety Cause Area — LessWrong
jsteinhardt · 2026-05-26 · via LessWrong

x

Cognitive Security as an AI Safety Cause Area — LessWrong