惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

The Register - Security
The Register - Security
美团技术团队
Recent Announcements
Recent Announcements
MongoDB | Blog
MongoDB | Blog
Jina AI
Jina AI
C
Check Point Blog
aimingoo的专栏
aimingoo的专栏
I
InfoQ
S
Securelist
T
Tor Project blog
GbyAI
GbyAI
L
LINUX DO - 热门话题
V
Visual Studio Blog
AWS News Blog
AWS News Blog
The Cloudflare Blog
腾讯CDC
K
Kaspersky official blog
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
Recorded Future
Recorded Future
李成银的技术随笔
W
WeLiveSecurity
Cyber Security Advisories - MS-ISAC
Cyber Security Advisories - MS-ISAC
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
M
Microsoft Research Blog - Microsoft Research
G
Google Developers Blog
酷 壳 – CoolShell
酷 壳 – CoolShell
Schneier on Security
Schneier on Security
B
Blog
IT之家
IT之家
爱范儿
爱范儿
H
Help Net Security
Simon Willison's Weblog
Simon Willison's Weblog
NISL@THU
NISL@THU
J
Java Code Geeks
博客园 - 聂微东
T
The Exploit Database - CXSecurity.com
Cyberwarzone
Cyberwarzone
博客园 - 叶小钗
MyScale Blog
MyScale Blog
Application and Cybersecurity Blog
Application and Cybersecurity Blog
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
Project Zero
Project Zero
F
Future of Privacy Forum
D
Darknet – Hacking Tools, Hacker News & Cyber Security
CTFtime.org: upcoming CTF events
CTFtime.org: upcoming CTF events
Hacker News: Ask HN
Hacker News: Ask HN
D
Docker
Apple Machine Learning Research
Apple Machine Learning Research
B
Blog RSS Feed
V
Vulnerabilities – Threatpost

Hacker News - Newest: "AI"

Built a live multi-agent AI operations workspace for software engineering teams KiroGraph: Local code knowledge graph for AI, optimized for token efficiency GitHub - the-ai-coop/open-letter Intuit CEO says company’s 17% workforce cut had ‘nothing to do with AI’ AMD Ups Ante With 192GB Ryzen AI Max PRO 400 Chips for AI Systems I Taught an AI to Be Our On-Call Engineer AI token streaming isn't about SSE vs WebSockets — /dev/knill AI Engineering from Scratch Google search AI mode, the biggest update Gemini Omni Flash AI Video Generator | Free Online Twelve Ways to Be Wrong About AI-Assisted Coding Linki – open-source AI SDR for LinkedIn sequences and cold email Transforming Digital Pathology with AI GitHub - enzoferraripapa-arch/ai-vprocess-ops: Engineering memory for AI coding agents: requirements, decisions, evidence, traceability, and V-process/ALM handoff China has named, defined and started governing agentic AI WebMCP: I Made My Website AI Agent Ready (Here's How) Bezos defends billionaires, hypes AI, talks taxes and praises Trump in CNBC interview Growing an AI orchestration platform to $3k MRR in 4 weeks Do you enjoy reading any type of AI written text? Dust raises $40M Series B to scale multiplayer AI for human-agent collaboration SkinMax App | Your Personal Skin Care Coach Client Challenge AI red teaming agents change how LLMs get tested Standard Charter CEO Replaces 8000 "Lower Value Humans" with AI Design advice you can actually use SpaceX IPO filing lays bare losses and Musk control as it stakes future on AI Show HN: SafeRun – Replay debugging and inline prevention for AI agents 3 GitHub - sathvikc/agent-chat-bridge: Turn any AI agent chat session into an async agent. Register a timer, shell command, or webhook — the bridge automatically resumes the session with your prompt when the trigger fires. The Google AI Pro plan just got a quiet downgrade, here is the new deal Google is dethroning OpenAI as the king of consumer AI Ordo · Smart earbuds with cameras & AI TBN Protocol — Full Demo What I'd audit on an AI-built SaaS before its first paying customer The AI Client in WordPress 7.0 Show HN: SafeRun – Replay debugging and inline prevention for AI agents 2 White House briefs AI firms on plans for model review Invasion of the literary bots What Models? — Pick the right model for your GPU in seconds An AI system to help scientists write expert-level empirical software How Many Questions Can the World Afford to Ask AI? Meta Begins AI-Driven Layoffs, Report Says. Can They Boost the Struggling Stock? Benchmarking Open-EndedInference Optimization by AI Agents The Elements of Power (AI Supply Chain) JAM: DSP audio engine programmable via AI chat Free AI Rewriter - Revise Can one run AI on source code with the prompt "Find below-avg swear rate files"? twitter.com The Developer's Guide to AI When AI can write your code, do you still need a CMS? Congress Banned a Gun Registry. AI Doesn't Need One Cloudflare CEO on how he chooses which employees to replace with AI Replacing NZ public servants with AI could come with hidden costs, critics warn How America Turned Against AI According to the Poll Data: A (Very Big) Compilation GitHub - brucehoult/k3_ai: Utility to start a program on the A100 "AI" cores on SpacemiT K3 machines. Claude.AI Pro Plan quotas too small for deep research AI slop? What about human slop? | NadathurX Token Offset · Offset the environmental cost of AI Show HN: AI Editor for Websites AI Resist List Wheelly.ai — AI in every app, with one hotkey AI atlas reveals hidden whole-body-damage caused by obesity AI robot is now a Buddhist monk Advanced AI models bring government to ‘reflection point,’ CIA official says Linus Torvalds admits he has a 'love-hate relationship with AI' Singapore inks AI deals with Google, OpenAI as ChatGPT-maker commits $234 million to local ecosystem San Francisco turns to AI to save whales from ship strikes as deaths soar What will better AI mean? Why Compiled AI makes AI Enterprise ready The Wake-Up Call for 2026 and 2027 · Greg Herlein The AI people have been right a lot Learn how to build AI products through practice The Alaska Permanent Fund as Loose Precedent for AI Data Center ‘UBI’ Payments Client Challenge The AI bots are coming and the young are booing, not applauding How to sell RL envs and data to AI labs [video] hty Guidelines for Human-AI Interaction - Microsoft Research Bezos brushes off concerns of an AI bubble: 'You shouldn't worry about it' AI Safety Is Underfunded by Design On people writing about their use of AI – Manu PreyReach — Find local-business leads with one prompt — sourced live from Google Places. Meditations on "non-public" AI Bye-bye, Gemini CLI; Google nudges devs toward Antigravity SysWP Radar — Veja TUDO que toca seu site CEO Walks Back Comment About Replacing 'Lower-Value Human Capital' with AI Sycophantic AI Decreases Prosocial Intentions and Promotes Dependence Policing AI Use in Writing AI and the Future of Music Fears of unfettered hacking spurred by Anthropic's Mythos AI model overstated AI in Design Report 2026 A new generation of ads for the AI era of Search You never learned to delegate. AI just made it obvious. GitHub - zero-intelligence/zero-protocol: ZERO.md — A universal protocol for personal AI context. Your AI knows your project. It doesn't know you. This file fixes that. AI-generated abandonware is hollowing out open source Structural Backpressure Beats Smarter Agents Benchmarking AI coding agents for distributed SQL: 350 runs, 17 models Notes on AI, Labor, and China Intuit to lay off over 3,000 employees to refocus on AI Nobel laureate Olga Tokarczuk apparently used AI to write her latest novel. Russian Troops' Fear Grows as Ukraine AI "Slaughterbot" Drones Headhunt Them
AI Can Seem More Human Than Real Humans in a Classic Turing Test, Study Finds
giuliomagnif · 2026-05-21 · via Hacker News - Newest: "AI"

New UC San Diego research suggests the line between human and machine is increasingly drawn around social behavior as much as knowledge

Side portrait of attractive young european businesswoman with digital head outline, circuit and binary code. Artificial intelligence and innovation concept.

credit:peshkov/iStock

  • UC San Diego researchers ran a rigorous three-party Turing test and found that, with the right “persona” prompt, advanced AI can pass as human in live chats. 
  • GPT-4.5 was judged “human” 73% of the time; LLaMa-3.1-405B was 56%. 
  • Without persona prompting, performance dropped sharply. 
  • The results raise new questions about online trust, deception and what “humanlike” means.

A new University of California San Diego study unveils the first empirical evidence that a modern artificial intelligence system can pass the Turing test — a major scientific benchmark that asks whether a machine can imitate human conversation so convincingly that people can’t reliably tell it apart from a real person. In a series of experiments, people were often unable to tell the difference between humans and advanced large language models (LLMs).

The study published in the Proceedings of the National Academy of Sciences is the first to rigorously test LLMs with the method that British mathematician and “father of computer science” Alan Turing created in 1950. It is also the first time anyone has found that models were judged to be human as often as actual humans using the Turing framework. 

“What we found is that if given the right prompts, advanced LLMs can exhibit the same tone, directness, humor and fallibility as humans,” said the study’s corresponding author Cameron Jones, who conducted the research while earning his doctorate in Cognitive Science from UC San Diego. “While we know LLMs can easily produce knowledge on nearly every topic, this test showed that it can also convincingly display social behavioral traits, which has major implications for how we think of AI.” 

In the test, a participant chats simultaneously with two other parties — one is a human and the other is an LLM —and the human “interrogator” must decide which party is the human.

Across randomized, controlled, experiments with two independent participant groups — UC San Diego undergraduates and online participants — interrogators held brief, text-based conversations and then made their judgments. In the experiments participants chatted with four different LLMs — GPT-4.5 and LLaMa-3.1-405B as state-of-the-art models — and the researchers also included older baseline models for comparison. Those models included GPT-4o and ELIZA, a classic 1960s rules-based chatbot. 

Across the four LLMs, GPT-4.5 was judged to be the human 73% of the time, meaning interrogators selected it as “human” significantly more often than they selected the real human participant. LLaMa-3.1-405B, given the same prompt, was judged human 56% of the time — statistically indistinguishable from the humans it was compared against. 

Baseline systems performed far worse: ELIZA and GPT-4o were selected as human only about 23% and 21% of the time overall.

76 years after its inception, the Turing Test takes on new meaning 

The study also reveals that scientists need to start thinking of the Turing Test differently, according to the authors. 

“The Turing test started as a way to ask whether machines could rival human intelligence,” said study coauthor Ben Bergen, a professor of cognitive science at UC San Diego. “But now we know AI can answer many questions faster and more accurately than people can, so the real issue isn’t raw brainpower. Seeing that machines can pass the test — and seeing how they pass it — forces us to rethink what it measures. Increasingly, it’s measuring humanlikeness.”

Visual of the four different LLMs — GPT-4.5 and LLaMa-3.1-405B as state-of-the-art models — and the researchers also included older baseline models for comparison.

Image created by AI (ChatGPT)

Alan Turing - a closeup portrait from English money - pound

Alan Turing a closeup portrait from English money - pound. Credit: johan10/iStock. 

Each LLM was given a “persona” prompt, asking it to adopt a specific human character and communication style. Bergen explained that the LLMs were not winning through shows of force of knowledge, they were winning because they made mistakes like a human would. “These traits aren’t the kinds of math and logic problem-solving intelligence that I think Turing was imagining.”

Prompts mattered — dramatically

Without explicit instructions, the models were far less likely to be mistaken for human: GPT-4.5 fell to a 36% win rate and LLaMa-3.1 to 38%, while baseline systems ELIZA (23%) and GPT-4o (21%) were chosen as human even less often. 

The same systems that could pass as human when given detailed instructions on what kind of character they should play were unable to adopt such characteristics without that guidance — suggesting that while the models can behave in convincingly human ways, they often need humans to tell them how. 

“They have the ability to appear human-like, but maybe not as much the ability to figure out what it would take to appear human-like,” Bergen said.

ameron Jones earned his doctorate in Cognitive Science from UC San Diego in 2025 (left) and Ben Bergen (right) a professor of Cognitive Science at UC San Diego.

Cameron Jones earned his doctorate in Cognitive Science from UC San Diego in 2025 (left) and Ben Bergen (right) a professor of Cognitive Science at UC San Diego. Courtesy of Stony Brook University and UC San Diego.  

Why it matters: trust, deception and the rise of “counterfeit people”

The results carry real-world implications for trust online — especially because the models that pass as human do so over the course of extended five or 15-minute conversations. “It’s relatively easy to prompt these models to be indistinguishable from humans,” said Jones, who is now an assistant professor of Psychology at Stony Brook University. “We need to be more alert; when you interact with strangers online people should be much less confident that they know they’re talking to a human rather than an LLM.”

He also points to darker risks. “The Turing test is a game about lying for the models,” Jones said. “One of the implications is that models seem to be really good at that.”

Bergen added that being unable to discern whether you’re interacting with a human or bot can have serious consequences. 

 “There are lots of people who would like to use bots to persuade people to share their social security numbers, and vote for their party, or buy their product,” he said. 

The researchers note they hope the work sharpens public understanding of what these systems can now do — and what kinds of safeguards society may need.

A live, head-to-head test of human vs. machine

To run the study, the researchers built an online interface designed to feel like a familiar messaging app. 

“For the interrogator, they have a split screen on their computer and they’re asking questions to both witnesses,” Jones said. “They know that one of those witnesses is a human and one of them is an AI.” 

After five minutes — and in a separate replication study, 15 minutes—the interrogator had to decide which conversational partner was the real human.

To confirm the results held beyond a single population, the researchers ran the study with two groups: UC San Diego undergraduates recruited through the SONA system and a broader online sample recruited through Prolific, a platform that pays participants to complete research studies. Nearly 500 people participated across the experiments.

UC San Diego participants performed slightly better overall, possibly because they shared more “common ground” that could be used to probe one another, such as shared experiences and local campus details.

A version of the Turing test interface used in the study is available at turingtest.live.

Read the full study “Large Language Models Pass a Standard Three-Party Turing Test.” 

Stay in the Know

Keep up with all the latest from UC San Diego. Subscribe to the newsletter today.