What am I, if not an AI? — LessWrong - 惯性聚合

推荐订阅源

Threat Intelligence Blog | Flashpoint

Proofpoint News Feed

Lohrmann on Cybersecurity

Secure Thoughts

Attack and Defense Labs

人人都是产品经理

Stack Overflow Blog

博客园 - Franky

Microsoft Azure Blog

Tor Project blog

Microsoft Security Blog

aimingoo的专栏

Security Latest

Hacker News: Front Page

Google Online Security Blog

Privacy & Cybersecurity Law Blog

Cyber Security Advisories - MS-ISAC

Darknet – Hacking Tools, Hacker News & Cyber Security

李成银的技术随笔

Full Disclosure

Fortinet All Blogs

The Exploit Database - CXSecurity.com

WordPress大学

IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog

Visual Studio Blog

Java Code Geeks

博客园 - 三生石上(FineUI控件)

Google Developers Blog

博客园 - 司徒正美

Engineering at Meta

Last Week in AI

Palo Alto Networks Blog

宝玉的分享

True Tiger Recordings

News and Events Feed by Topic

酷壳 – CoolShell

Cisco Talos Blog

News | PayPal Newsroom

SegmentFault 最新的问题

LessWrong

AI #169: New Knowledge Learned Chain-of-Thought Obfuscation Generalises to Unseen Tasks Numb mental state shifts — LessWrong Women should be able to open things — LessWrong Why are people so scared of causing fear? Document-tuning instills durable animal compassion in LLMs (and generalizes to humans) What About Us? The Whole Kitten-Cavoodle Why does off-model SFT degrade capabilities? — LessWrong If I Were Emperor of New AI Safety Researcher Training... — LessWrong theory uplift differentially benefits safety & is underleveraged Singular Learning Theory Comprehensive - 1 — LessWrong Sparse Efficiency vs. Superposition: The Interpretability Tradeoff — LessWrong The Case for Evaluating Model Behaviors Toward Interoperability of Minimal Programs — LessWrong Fundamental Uncertainty $2,000 Essay Contest — LessWrong Check out my technological uplifting, civilization-building, and science in a magic world fiction! Synthetic Persona Pretraining: Alignment from Token Zero — LessWrong Give my children minds — LessWrong Power-seeking agents will likely be developed — LessWrong Apply now to Human-Aligned AI Summer School 2026 — LessWrong From 8B to Frontier: How System Prompts Control Whether AI Agents Blackmail, Leak, and Kill — LessWrong If AI is normal technology, history is not reassuring. Pythagorean addition — LessWrong So you don't want everybody to die — LessWrong Temporal Proportional Representation Conclave 1492 Childhood And Education #19: Letting Kids Be Kids #2 — LessWrong Implications Of Predicting The Next Token Housing Roundup #15: The War Against Renters Leaving DCA to the North on Foot A Visual Guide to Natural Latents — LessWrong Humans are not automatically strategic — "inner work" edition Cyborg Uplift Studies We Need to Get Serious about Uplift Studies Brain Structure and IQ: How Myelin Elevates Intelligence Sealing Conditional Misalignment in Inoculation Prompting with Consistency Training Let's have more partial insiders. Roadmap through AI safety programs for early-career technical researchers Should Rationalists Looksmaxx? When Fluency Is Free AI emotions and aligned behavior Tracking Difficulty with Feature Portfolios Outsiders should focus on specs/constitutions Outsiders should focus on specs/constitutions (among other things) Logical Share Splitting for Intuitionists Coordinal: A Postmortem. Noticing Confusion: A practice in staying curious Dating Roundup #12: Sex and Violence Negation Neglect: When models fail to learn negations in training So are you some kind of communist? Thoughts on interviewing candidates for AI safety fellowships PauseAI Munich Local Group Kickoff Classifier Context Rot: Monitor Performance Degrades with Context Length How useful is cross-domain generalization for training LLM monitors? Jhana Quick Start Guide Links #1: 2026/05 Part 1 why pollen allergies? Why Physical Attractiveness Matters for Men's Dating Prospects Bay Summer Solstice 2026 How to Quit Fandom: Apostasy Engineering a Safer World: Risk Modelling — and Safety Engineering? — for AI Loss of Control Next Token Prediction is a Misleading Term Can ELK be brute-forced? Intertheoretic reduction James C. Scott: Seeing Like a State How to Reason about Your Health Issues Are You Not Rationalists? — LessWrong Falling for the statistical parrot — LessWrong On getting unstuck — LessWrong A relatively brief explanation of Boltzmann Brains — LessWrong Benchmarking Real Work — LessWrong Critique Systems, Not Reality Trying to use NLAs to find out how Qwen 2.5 7B does multiplication — LessWrong A Year Late, Claude Finally Beats Pokémon NLA Verbalizations on AuditBench: Llama 70B — LessWrong An Introduction to Exemplar Partitioning for Mechanistic Interpretability An Argument for Analogies—Polymaths 1/3 — LessWrong Incriminating misaligned AI models via distillation — LessWrong Critical Thinking as a Gym Schedule Why I am not too worried about AIpocalypse: Scott Alexander vs Nicolaus Copernicus — LessWrong Risk reports need to address deployment-time spread of misalignment — LessWrong Monthly Roundup #42: May 2026 Mechanistic estimation for expectations of random products Clarifying the Darwinian Honeymoon — LessWrong Announcing the Center for Shared AI Prosperity — LessWrong MATS 9 Retrospective & Advice — LessWrong Data Quality is Way Underrated, and We Should Start Funding It. Don’t be too Clever to Take Obvious Advice Some observations about NLA explanations The hard core of alignment (is robustifying RL) Convergent Abstraction Hypothesis Emma Baker on ADHD Designing AI factual claims for "easy verification" Automated Alignment is Harder Than You Think 2B scoring model flags out-of-domain misalignment, suggesting specialist judges have potential for audits The safe-to-dangerous shift is a fundamental problem for eval realism; but also for measuring awareness Most "inner work" is not optimized for results. AI #168: Not Leading the Future Why Ensuring Flourishing Is Not About Alignment Intervening on Sparse, Anchored Concepts

What am I, if not an AI? — LessWrong

makiba · 2026-05-21 · via LessWrong

此内容由惯性聚合(RSS阅读器)自动聚合整理，仅供阅读参考。原文来自 — 版权归原作者所有。