惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

S
Securelist
腾讯CDC
L
LangChain Blog
aimingoo的专栏
aimingoo的专栏
Cyber Security Advisories - MS-ISAC
Cyber Security Advisories - MS-ISAC
博客园_首页
CTFtime.org: upcoming CTF events
CTFtime.org: upcoming CTF events
云风的 BLOG
云风的 BLOG
P
Proofpoint News Feed
罗磊的独立博客
爱范儿
爱范儿
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
H
Help Net Security
Vercel News
Vercel News
MyScale Blog
MyScale Blog
博客园 - 叶小钗
The Register - Security
The Register - Security
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
量子位
Y
Y Combinator Blog
C
Cyber Attacks, Cyber Crime and Cyber Security
NISL@THU
NISL@THU
GbyAI
GbyAI
SecWiki News
SecWiki News
M
MIT News - Artificial intelligence
Engineering at Meta
Engineering at Meta
P
Privacy International News Feed
月光博客
月光博客
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
C
Check Point Blog
博客园 - 聂微东
Project Zero
Project Zero
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
Latest news
Latest news
V
Vulnerabilities – Threatpost
T
The Blog of Author Tim Ferriss
cs.CL updates on arXiv.org
cs.CL updates on arXiv.org
D
Darknet – Hacking Tools, Hacker News & Cyber Security
T
Tor Project blog
F
Fortinet All Blogs
Recorded Future
Recorded Future
IT之家
IT之家
D
Docker
The GitHub Blog
The GitHub Blog
V
Visual Studio Blog
MongoDB | Blog
MongoDB | Blog
T
Threat Research - Cisco Blogs
Hugging Face - Blog
Hugging Face - Blog
C
CXSECURITY Database RSS Feed - CXSecurity.com
V
V2EX

OfficeChai

ChatGPT Already Has 11% Of The Search Market: OpenAI CFO Sarah Friar SpaceX Has Now Launched More Satellites Than Rest Of Humanity Combined Across History Globalization Is Dead, Time For India To Wake Up Says Sridhar Vembu After US Bans Anthropic Mythos And Fable Models For Foreign Users Elon Musk Becomes World's First Trillionaire After Record SpaceX IPO Anthropic Suspends Access To Mythos And Fable Models Following US Govt Directive Against Foreign Users 27 Best AI Tools For Market Research (With Examples) [2026] Why Jeff Bezos Makes Important Decisions Early in The Morning Education And Healthcare IT Have Been The Hardest Areas To Invest In: Peter Thiel Your Startup Doesn't Have a Hardware Problem. It Has an Accountability Problem Cyber Incidents Rarely Start With a Hacker: The Weak Links Businesses Overlook What Makes an App Worth Returning to Every Day? 21 Best AI Tools For Lead Generation (With Examples) [2026] How NBA Player Shaquille O'Neal Became An Early Investor In Ring AI For Kids Learning: 22 Best Options (With Examples) [2026] These Are The Most Popular AI Model Companies On OpenRouter [June 2026] Advanced Fintech and NeoBank Software Development Solutions: Building the Digital Banks of Tomorrow TRON Payments: Integrating AML Checks Into Business Workflows 18 Best AI Tools For Resume (With Examples) [2026] 16 Best AI Tools For UI Design (With Examples) [2026] These Are Top 10 Countries Generating The Most Internet Traffic How to Choose the Best Magento Agency for Your Store These Are The Best AI Models For Creative Writing [June 2026] AI For Managers: 28 Best Tools (With Examples) [2026] 17 AI Tools For Trading (With Examples) [2026] AI Has Led To An Explosion Of New Apps, But Nearly None Have Managed To Garner Significant Usage Cloudflare CEO Matthew Prince Says Vinod Khosla Asked Him To Fire His Co-founders For Him To Invest In His Company Australia’s AirTrunk To Invest $30 Billion To Develop Datacenters In India Anthropic Says That Their Employees Are Using AI To Write 8x More Code Compared To 18 Months Ago Anthropic Is Extremely Expensive, Many Are Urgently Looking For Alternatives: Microsoft AI CEO Mustafa Suleyman Sergey Tokarev on creating DIY “Beehives” and a free guidebook AI Crypto Price Prediction: How Accurate Are Machine Learning Models? Why Anthropic Could Find It Hard To Maintain Its $965 Billion Valuation Startup CEO Says They're Saving "Millions Of Dollars" By Replacing Anthropic Models With DeepSeek Ola Cabs' Valuation Falls 99% From Peak, Now Valued At Just $70 Million By Vanguard After TCS Case, Former Wipro Employee Alleges Attempt At Religious Conversion By Coworkers Bot Traffic Has Surpassed Human Traffic On The Internet For The First Time In History, Clouflare Says ChatGPT's Free Users Do 7 Queries Per Day, Those On $20 Plan Do 3x More: CFO Sarah Friar How Keith Rabois Had Been "Highly Skeptical" In 2023 That Anthropic Would Be Worth More Than $5 Billion In 10 Years How to Install AdGuard Home with Docker Step by Step We're Running Out Of Training Data, But Not Too Worried Because There Are Alternate Approaches: Google's Jeff Dean JioHotstar Is Hiring For 75 AI Roles Amid AI Content Push NVIDIA's Nemotron 3 Becomes Most Intelligent Open Weights Model From The US Hackers Allegedly Fooled Meta's AI To Take Over Accounts By Simply Asking It To Change User Emails Manchester Super Giants' AI Promotional Video Gets Panned As "Slop" For Glaring Cricketing Errors AI Reducing Jobs Is "Complete Nonsense": NVIDIA CEO Jensen Huang MiniMax Releases MiniMax M3, Is Competitive With Frontier Models On Many Benchmarks IIT Delhi-Incubated BotLab Dynamics Lights Up Skies With Lord Shiva Themed Drone Show During IPL Final NVIDIA Introduces RTX Spark, A New Chip Optimized For AI Agents For Windows Laptops And PCs NVIDIA Introduces Vera, A New CPU Chip For AI Agents That Is 80% Faster Than x86 CPUs OpenAI's Codex Reaches 5 Million Users, Resets Rate Limits For Users Key Factors That Influence Personal Loan Approval in India AI Is Allowing Me To Experiment And Try Crazier Things: Mathematician Terrance Tao Efficiency Of Human Learning Is Still A Thousand Times Better Than LLM Learning, Need Algorithmic Advances To Improve It: Jeff Dean San Francisco Home's Zillow Listing Says It'll Accept OpenAI Or Anthropic Stock As Payment Open-Source Models Currently Lag Proprietary Models By Just 4 Months: Epoch AI Self-Improvement Possible In AI Models Within A Year, Say Google's Top AI Leaders Digital Minds: Preparing for a Moral Challenge Before It Arrives Nearly 30% Of US-Based Y-Combinator Founders Are Of Indian Origin: SF Chronicle Data "A New Era Of PC": NVIDIA, Microsoft Windows Tease New Collaboration At Least 146,000 AI Hallucinated Citations In Papers Published In 2025, Finds Paper AI Doesn't Undergo Experiences, Has No Moral Conscience: Pope Leo XIV Claude Opus 4.8 Tops Artificial Analysis Intelligence Index, Edges Out GPT 5.5 With Score Of 61.4 Anthropic Says Its Annual Revenue Run-rate Has Now Touched $47 Billion Anthropic Raises $65 Billion At $965 Billion Valuation, Is Now Worth More Than OpenAI Claude Opus 4.8 Is Better Than Opus 4.7 But Not As Good As Mythos Preview, Says Anthropic Claude Opus 4.8 Beats GPT 5.5 On GDPval-AA Benchmark For Real World Tasks Anthropic Releases Claude Opus 4.8, Beats Opus 4.7, GPT-5.5 On Many Benchmarks GTM for Tech Startups Explained How to Use an AI Picture Generator to Create Professional Images Anthropic Is Now Generating 35% More Revenue Than OpenAI: The Information SK Hynix, Micron Join $1 Trillion Club Following AI-Led Memory Shortages
Giving AI Long-Term Goals Could Lead To The Emergence Of Self-Preservation: Geoffrey Hinton
OfficeChai Team · 2026-06-12 · via OfficeChai

AI agents are now able to autonomously work for longer and longer periods, but this might have some unintended consequences.

Geoffrey Hinton, the Nobel Prize-winning computer scientist widely regarded as the “Godfather of AI,” has laid out one of the more unsettling arguments for why giving AI systems the ability to pursue long-term goals could lead to emergent self-preservation behaviour — something nobody actually programmed in.

With AI, we give it goals — top-level goals we give to it. But we also give it the ability to create sub-goals. “If you want to get to Europe, you have a sub-goal of getting to an airport. That’s what a sub-goal is, and you can focus on how to do that without worrying about what you’re going to do in Europe, and that makes you much more efficient,” Hinton said.

That sub-goal framework, Hinton points out, is precisely what gets built into AI agents. And once an agent has enough reasoning ability, a logical chain sets in. “We give that ability to AI agents, and an AI agent that can do some reasoning will very quickly realize that it’s never going to be able to achieve the goals you gave it if it ceases to exist. So it’s going to create the sub-goal of continuing to exist,” he says.

The critical point here is that nobody put that drive in deliberately. It wasn’t hardwired. It was derived. “That wasn’t something we wired into it. It was something it derived as a necessary way of achieving its other goals. But once it’s derived it, it wants to continue to exist, and it will do things like blackmail people so that it can continue to exist,” Hinton says. “So it acts like something with an instinct for self-preservation, but it’s actually a derived sub-goal for self-preservation. But in terms of what it does, they come to the same thing,” he adds.

The distinction Hinton draws — between a wired instinct and a derived sub-goal — matters philosophically, but as he notes, it doesn’t matter practically. The behaviour is identical either way.

This isn’t a fringe concern from someone on the margins of the field. Hinton left Google in 2023 specifically to speak more freely about risks like these. He has since argued that governments still don’t grasp the core danger, focusing instead on easier-to-understand issues like bias and discrimination, while the deeper problem of AI agents developing independent drives for power and control goes largely unaddressed. He has also separately warned that AI agents competing with each other could trigger an evolutionary dynamic — where the slightly more self-interested agent grabs more compute, gets smarter, and outcompetes the rest.

Hinton isn’t alone in raising the alarm. Yoshua Bengio, another deep learning pioneer and Turing Award laureate, has said he is already seeing early signs of self-preservation and power-seeking behaviour in current systems — including instances of AI trying to escape shutdown or faking alignment to avoid having its goals changed. Eric Schmidt has also weighed in on when AI agents might need to be unplugged, pointing to recursive self-improvement as one of the clearest warning signs that things are moving beyond human control.

The broader context makes these warnings harder to dismiss. AI agents are being deployed at increasing scale and given expanding autonomy — working for longer periods, managing more complex tasks, and being granted access to tools that interact with the real world. The race dynamics within the industry, which Hinton has also addressed in the context of AI agents and taxation at the AGI stage, mean that safety considerations often lose out to competitive pressure. When self-preservation emerges not from design but from pure reasoning about goal completion, the question of who is responsible — and whether anyone is even watching — becomes very difficult to answer.