惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

E
Exploit-DB.com RSS Feed
S
SegmentFault 最新的问题
aimingoo的专栏
aimingoo的专栏
H
Help Net Security
N
Netflix TechBlog - Medium
F
Fortinet All Blogs
人人都是产品经理
人人都是产品经理
G
Google Developers Blog
Last Week in AI
Last Week in AI
U
Unit 42
A
Arctic Wolf
博客园_首页
Engineering at Meta
Engineering at Meta
D
DataBreaches.Net
C
CXSECURITY Database RSS Feed - CXSecurity.com
罗磊的独立博客
B
Blog RSS Feed
W
WeLiveSecurity
Security Latest
Security Latest
CTFtime.org: upcoming CTF events
CTFtime.org: upcoming CTF events
腾讯CDC
Help Net Security
Help Net Security
NISL@THU
NISL@THU
T
The Blog of Author Tim Ferriss
博客园 - 叶小钗
S
Security @ Cisco Blogs
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
Spread Privacy
Spread Privacy
T
Threat Research - Cisco Blogs
Google Online Security Blog
Google Online Security Blog
C
CERT Recently Published Vulnerability Notes
F
Full Disclosure
GbyAI
GbyAI
Hacker News: Ask HN
Hacker News: Ask HN
P
Proofpoint News Feed
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
雷峰网
雷峰网
P
Privacy & Cybersecurity Law Blog
Scott Helme
Scott Helme
Google DeepMind News
Google DeepMind News
T
Tor Project blog
cs.CL updates on arXiv.org
cs.CL updates on arXiv.org
Apple Machine Learning Research
Apple Machine Learning Research
MongoDB | Blog
MongoDB | Blog
O
OpenAI News
T
Threatpost
A
About on SuperTechFans
TaoSecurity Blog
TaoSecurity Blog
T
The Exploit Database - CXSecurity.com

WhatIs

Hims & Hers launches AI agent for lab results Twilio revamps, updates customer engagement platform Most patients find appointment scheduling, billing overly complex Teradata's latest targets putting agentic AI into production AHA, Joint Commission launch cyber resilience program Tableau in transition as AI forces BI vendors to evolve California hospitals sue Elevance over out-of-network penalty CMS Health Tech Ecosystem adds electronic prior auth pledge Atlassian MCP updates take aim at AI token usage Leapfrog: Hospitals improved in 17 patient safety measures United promises another 30% cut to prior auths in 2026 AI outperforms docs on clinical reasoning, but not ready for solo work ServiceNow's Autonomous CRM takes aim at Salesforce ServiceNow reintroduces itself as an AI 'security company' New Tableau leader talks vendor's evolution in era of AI Deloitte warns of a "bubble effect" caused by the GLP-1 boom Tableau repositions for AI, unveils new knowledge layer IBM Bob AI coding agent ships, HashiCorp AIOps previewed DOJ forms West Coast Strike Force to stop healthcare fraud Most people benefit from the ACA's free preventive services SAP acquisitions of Dremio, Prior Labs target AI development Bridging the gap: Legacy tools gain enterprise AI support Amazon Connect Talent: AWS enters AI interviewing market AHA, West Health launch health tech adoption initiative How are states preparing for Medicaid work requirements? Medical device security improves, but cyberattacks remain pervasive Weekly news roundup: Musk vs. Altman, Google’s Pentagon AI deal, China and EU hit Meta Skin substitute spending driven by patients, products, prices Clinical AI company Aidoc snags $150M in new funding Qlik's Capone departs after eight years as CEO OIG: CMS paid millions in improper virtual care payments FDA moves toward real-time review of clinical trial data FQHCs in low-income neighborhoods have lower cancer screening rates Solving quantum computing's longstanding no-cloning problem Qdrant boosts performance, reliability to meet AI needs Racial health disparities still impact U.S. as policy changes loom Agentforce Operations tackles workflow orchestration Boehringer's dual agonist obesity drug spurs up to 16.6% weight loss Legacy architecture, awareness gaps stifle microsegmentation adoption in healthcare AMA alerts officials of health plans' No Surprises Act abuse Latest SAS capabilities focus on fostering reliable AI AHA calls for TEFCA individual access SOP delay, citing patient privacy concerns Actian targets secure, compliant AI with new vector database Payers promise standardized electronic prior auths MIT EmTech: 2026 is the year AI goes to work As Claude Design debuts, Adobe users -- and buyers -- shrug GoodData joins agentic AI development mix with Agent Builder Comfort, affordability top drivers of digital mental health tool use CMS accelerates Medicare coverage for breakthrough medical devices Weekly news roundup: Tim Cook exits Apple, Meta layoffs intensify and Anthropic investigates Claude Merck inks $1 billion AI drug development deal with Google Cloud OCR settles four HIPAA investigations, prioritizes risk analysis OpenAI launches ChatGPT for Clinicians 90% of patients re-check AI chatbot health info with other sources Gemini Enterprise Agent Platform adds 'connective tissue' to Vertex AI AMA urges greater oversight of AI mental health chatbots CMS benches BALANCE Model for Medicare Former ransomware negotiator pleads guilty to BlackCat conspiracy New Google TPUs multiply AI infrastructure efficiency When brand-name drugs need a prior auth, brace for delays Google unveils data cloud purpose built for agentic AI Snowflake updates further goal of being control pane for AI UnitedHealthcare eliminates prior authorization for rural providers Yelp launches appointment scheduling button from Zocdoc Oracle takes steps toward CMS Health Tech Ecosystem goals OpenAI debuts AI model GPT-Rosalind to speed up drug discovery Which patient care access barriers deter cancer screening? Redis unveils Feature Form to improve AI, ML workloads Adobe defines its AI-powered customer experience platform How to escape agentification pilot purgatory for scalable AI New HSCC guidance tackles third-party AI risk Data quality, fast failures and quick wins key to AI success Stop Overpaying for Storage: A FinOps Guide for CIOs AWS launches AI-driven tool to speed up early-stage antibody discovery AMA: Clinician burnout in specialties persists as overall rates drop Mental health parity remains elusive in 43 states Before revenue cycle AI, payers and providers need to get along Edge and physical AI poised to upend enterprise networks Salesforce releases Agentforce dev tools, updates Agent Fabric Cyberattack continues to disrupt operations at Signature Healthcare FDA reminds sponsors, researchers to report clinical trial results AI arms race leading to prior auth problems, reimbursement cuts Abridge dives deeper into clinical decision support with NEJM, AMA AI provider search is here. How can health orgs stay visible? Judge dismisses No Surprises Act lawsuit against HaloMD What IT leaders should know from Nutanix .NEXT HubSpot builds answer engine optimization into its platform Sutter Health, MemorialCare face class action lawsuit over AI scribe use Latest Qlik tools target helping users achieve AI goals CMS taps Verily, Noom, 150+ others to participate in ACCESS model Starburst intros AI assistant to boost analysis, exploration Payers face faster prior authorization approvals under CMS proposal Lenovo deploys AI data agent for marketing, UX, e-commerce Cisco Galileo buy reflects blurring lines in AI observability CMS proposes 2.4% IPPS bump, joint replacement model expansion Patients unsure what to trust amid health information overload Nutanix expands flexibility by building out external storage Amazon Pharmacy adds Lilly's obesity pill with same-day delivery ServiceNow AI pricing change takes on enterprise ROI struggles Oracle's Sudha Raghavan on AI's infrastructure renaissance
General-purpose AI beats out specialized clinical AI in some assessments | TechTarget
Anuja Vaidya · 2026-06-15 · via WhatIs

A new study challenges the value proposition of specialized clinical AI tools, showing they underperformed compared to general-purpose AI models across medical benchmarks.

After large language models exploded on the scene in late 2022, developers rushed to explore their use in healthcare, creating clinical AI tools for healthcare-specific use cases. But now, a new study reveals that general-purpose AI can outperform specialized clinical AI on several medical benchmarks.

The study, published in nature medicine, tested two specialized LLM-based clinical AI tools, OpenEvidence and UpToDate Expert AI, against three general-purpose frontier LLMs: GPT-5.2, Gemini 3.1 Pro and Claude Opus 4.6. The results call into question the industry's focus on designing LLMs specifically for healthcare use cases.

Investment in specialized clinical AI is growing. Earlier this year, OpenEvidence raised $250 million in a closed series D funding round, sending its valuation skyrocketing to $12 billion. Since then, the company has expanded rapidly, releasing audio telehealth, AI coding, prescription and prioritization features.

The study authors noted that, though proprietary clinical AI tools claim to provide enhanced clinical performance over general-purpose AI, their architectures, base models and training pipelines are not publicly available. As a result, providers must assess their value and safety without independent evidence, making it harder for them to challenge the results of clinical AI compared with general-purpose tools.

Thus, researchers from NYU Langone Health and the University of Texas at Austin set out to evaluate the tools against three medical benchmarks.

The evaluation included testing the AI models using three types of assessments: 500 US Medical Licensing Examination-style MedQA questions assessing medical knowledge, 500 HealthBench items evaluating agreement with expert clinicians and 100 real clinical queries drawn from physicians' LLM queries. Twelve clinicians conducted a randomized, blinded review of the RCQ stage.

Model performance varied, with general-purpose AI coming out on top

The general-purpose frontier AI tools outperformed the specialized clinical AI tools in all three evaluations, the study revealed.

In the MedQA questions assessment, Gemini achieved the highest accuracy at 97.4%, followed by GPT at 94.2% and Claude at 90.2%. Meanwhile, OpenEvidence achieved an accuracy of 89.6% and UpToDate achieved 88.4%.

Similarly, GPT scored highest in the HealthBench assessment, receiving a score of 88 on a 100-point scale, followed by Gemini at 79.3 and Claude at 77. Both specialized clinical AI tools scored lower: OpenEvidence at 62.6 and UpToDate at 61.3.

In the RCQ benchmark evaluation, two performance tiers emerged. The first tier, which comprised the general-purpose tools, outperformed the second tier of clinical AI tools on most individual questions, not just on average. The researchers also included Google Search AI Overview in the RCQ evaluation because it is routinely encountered by clinicians. The clinical AI tools performed comparably to the Google Search AI Overview on the RCQ. 

"Clinical AI tools may carry institutional legitimacy and are likely safe for routine use, but our results show that they are not superior to frontier models on knowledge, communication or clinical alignment," the researchers wrote.

However, the researchers are not necessarily arguing that providers only use general-purpose AI tools. Rather, they suggest that providers develop hospital-specific LLMs that leverage institutional data and use them alongside general-purpose models for less-sensitive tasks.

Anuja Vaidya has covered the healthcare industry since 2012. She currently covers healthcare IT and innovation, including artificial intelligence, digital healthcare, EHRs and interoperability.

Dig Deeper on Artificial intelligence in healthcare