Giving AI Long-Term Goals Could Lead To The Emergence Of Self-Preservation: Geoffrey Hinton

AI agents are now able to autonomously work for longer and longer periods, but this might have some unintended consequences.

Geoffrey Hinton, the Nobel Prize-winning computer scientist widely regarded as the “Godfather of AI,” has laid out one of the more unsettling arguments for why giving AI systems the ability to pursue long-term goals could lead to emergent self-preservation behaviour — something nobody actually programmed in.

With AI, we give it goals — top-level goals we give to it. But we also give it the ability to create sub-goals. “If you want to get to Europe, you have a sub-goal of getting to an airport. That’s what a sub-goal is, and you can focus on how to do that without worrying about what you’re going to do in Europe, and that makes you much more efficient,” Hinton said.

That sub-goal framework, Hinton points out, is precisely what gets built into AI agents. And once an agent has enough reasoning ability, a logical chain sets in. “We give that ability to AI agents, and an AI agent that can do some reasoning will very quickly realize that it’s never going to be able to achieve the goals you gave it if it ceases to exist. So it’s going to create the sub-goal of continuing to exist,” he says.

The critical point here is that nobody put that drive in deliberately. It wasn’t hardwired. It was derived. “That wasn’t something we wired into it. It was something it derived as a necessary way of achieving its other goals. But once it’s derived it, it wants to continue to exist, and it will do things like blackmail people so that it can continue to exist,” Hinton says. “So it acts like something with an instinct for self-preservation, but it’s actually a derived sub-goal for self-preservation. But in terms of what it does, they come to the same thing,” he adds.

The distinction Hinton draws — between a wired instinct and a derived sub-goal — matters philosophically, but as he notes, it doesn’t matter practically. The behaviour is identical either way.

This isn’t a fringe concern from someone on the margins of the field. Hinton left Google in 2023 specifically to speak more freely about risks like these. He has since argued that governments still don’t grasp the core danger, focusing instead on easier-to-understand issues like bias and discrimination, while the deeper problem of AI agents developing independent drives for power and control goes largely unaddressed. He has also separately warned that AI agents competing with each other could trigger an evolutionary dynamic — where the slightly more self-interested agent grabs more compute, gets smarter, and outcompetes the rest.

Hinton isn’t alone in raising the alarm. Yoshua Bengio, another deep learning pioneer and Turing Award laureate, has said he is already seeing early signs of self-preservation and power-seeking behaviour in current systems — including instances of AI trying to escape shutdown or faking alignment to avoid having its goals changed. Eric Schmidt has also weighed in on when AI agents might need to be unplugged, pointing to recursive self-improvement as one of the clearest warning signs that things are moving beyond human control.

The broader context makes these warnings harder to dismiss. AI agents are being deployed at increasing scale and given expanding autonomy — working for longer periods, managing more complex tasks, and being granted access to tools that interact with the real world. The race dynamics within the industry, which Hinton has also addressed in the context of AI agents and taxation at the AGI stage, mean that safety considerations often lose out to competitive pressure. When self-preservation emerges not from design but from pure reasoning about goal completion, the question of who is responsible — and whether anyone is even watching — becomes very difficult to answer.

推荐订阅源

OfficeChai