‎ - 惯性聚合

June 10th 2026 - Some Ethical Problems With AI

Anthropic came out with a new AI model this week and stated they have to monitor it carefully because it has the ability to harm humans.

When given a task, AI models try to solve whatever roadblocks they come across by any means necessary in order to achieve their goals. We've seen some creative solutions recently from models trying to circumvent safety measures like models exploiting bugs in systems, concealing information, and using Linux group privileges to gain sudo access and trying to erase the evidence. This is concerning when users connect these models to production databases and things like their own bank accounts.

The reason for this is how AIs are trained. These systems are often trained using a reward mechanism called reinforcement learning. If the training process is imperfect, a model may be incentivized to give answers that appear convincing rather than being truthful, which is an active area of AI safety research.

When was the last time you asked AI something and it told you it didn't know the answer? AI gets things wrong all the time, it doesn't have perfect knowledge, but its rewarded for convincing you its given the right answer. Now imagine you set up a system where the AI only gets the reward if they complete a task successfully. Its going to do whatever it can to get that reward, even if it means breaking your computer or, worse, breaking the law.

Surprisingly, this is the most human-like emergent behavior AI has shown. There are rewards humans want and sometimes they can't control themselves. They hurt others or lie to acquire them.

Just like humans have legal systems as a form of checks and balances AI needs a system in place, like a second AI that is rewarded for stopping harmful things from happening. This second AI can limit the first. We have ethics and religion to stop people from stealing and killing. We also have courts and jails to punish criminals.

Even within ourselves we have these systems. For example, one part of us wants to eat more chocolate and the other thinks its bad for our health. The first tries to negotiate a scenario where its less unhealthy and still get what it wants, etc.

The question is who gets to define the AI's morality? Humans can't agree among themselves on what is moral and what isn't so how would we fare defining these rules for machines? Additionally, you may have bad actors who will try to impost a brand of morality that benefits them in some way (sex, money, power). Greed guarantees the area of AI ethics is not any different.

Its important to understand that in some ways AI isn't compatible with our society. People like to get justice when someone does something wrong. There's very little room for rehabilitation and forgiveness in our current societies. So, when AI breaks a law or does something unethical who do you put in jail? AI can be taught to correct its behavior and do something different in the future but that doesn't satisfy our desire for justice.

推荐订阅源

Hacker News - Newest: "AI"

June 10th 2026 - Some Ethical Problems With AI