惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

GbyAI
GbyAI
博客园_首页
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
阮一峰的网络日志
阮一峰的网络日志
酷 壳 – CoolShell
酷 壳 – CoolShell
博客园 - 司徒正美
V
V2EX
Cloudbric
Cloudbric
Hugging Face - Blog
Hugging Face - Blog
腾讯CDC
量子位
博客园 - 三生石上(FineUI控件)
博客园 - 叶小钗
K
Kaspersky official blog
博客园 - 【当耐特】
T
Tenable Blog
L
Lohrmann on Cybersecurity
The Cloudflare Blog
S
Schneier on Security
A
Arctic Wolf
Latest news
Latest news
C
Cyber Attacks, Cyber Crime and Cyber Security
罗磊的独立博客
T
The Exploit Database - CXSecurity.com
Cisco Talos Blog
Cisco Talos Blog
小众软件
小众软件
P
Privacy & Cybersecurity Law Blog
WordPress大学
WordPress大学
Simon Willison's Weblog
Simon Willison's Weblog
雷峰网
雷峰网
NISL@THU
NISL@THU
人人都是产品经理
人人都是产品经理
月光博客
月光博客
J
Java Code Geeks
V
Visual Studio Blog
S
Security Affairs
博客园 - Franky
T
Tailwind CSS Blog
Apple Machine Learning Research
Apple Machine Learning Research
H
Heimdal Security Blog
有赞技术团队
有赞技术团队
V2EX - 技术
V2EX - 技术
AWS News Blog
AWS News Blog
G
GRAHAM CLULEY
T
Troy Hunt's Blog
SecWiki News
SecWiki News
Spread Privacy
Spread Privacy
宝玉的分享
宝玉的分享
www.infosecurity-magazine.com
www.infosecurity-magazine.com
博客园 - 聂微东

Latest news

I tested Surfshark's new Dausos VPN protocol - here's how it compares to WireGuard How to easily encrypt your files on an Android phone - for free I'm not giving up on DJI cameras yet - not when they can upset my GoPro like this The best website builders for small businesses in 2026: Expert tested and reviewed Why I'm recommending last year's phones over 2026 models - with one exception This powerful Gemini setting made my AI results way more personal and accurate After testing this HP laptop, I get why its 'boring' design is adored by business users The best TV antenna of 2026: Expert tested Your old iPad or Android tablet can be your new smart home panel - here's how Apple's original AirTag still tracks effectively, and you can get a 4-pack for its best price ever T-Mobile will give you an iPad for $99 when you sign up for a new line - here's how How to qualify for Apple's education discount - and get a $499 MacBook Neo for school T-Mobile will give you a Samsung Galaxy Watch 8 for free - how to get yours Prolonged AI use can be hazardous to your health and work: 4 ways to stay safe Verizon will give you a free iPad or Apple Watch with your next iPhone - how the deal works The best laptops of 2026: Expert tested and reviewed I hid 4 Bluetooth trackers (including AirTags) to test their reliability - here's how Android rivals compared I stopped using my iPhone's hotspot after testing this 5G router - and that won't change The best Kindles in 2026: Expert recommended Does Best Buy price match? Everything to know about matching prices online and in-store The best WordPress hosting services of 2026: Expert tested and reviewed The best Apple Watch of 2026: Expert tested and reviewed The best TV screen cleaners of 2026: Expert recommended The best 50-inch TVs of 2026: Expert tested I traded my Sonos Era 300 for Denon's new home speaker - and see no reason to go back AI-powered website builders have come a long way - here's your best option in 2026 Amazon just slashed $250 off the Google Pixel 10 - and a Prime subscription isn't required I found the apps slowing down my PC - how to kill the biggest memory hogs These companies are actually upskilling their workers for AI - here's how they do it Verizon will give you Meta Ray-Bans for free with this Fios Internet deal - how to get yours I tried the new Gemini app for Mac - it has one major advantage over the web version How Google's updated AI Mode will ease your tab clutter when you search Why this MagSafe battery pack is our readers' favorite model right now - especially at its price T-Mobile will give you a Google Pixel 10a for free - plus an extra gift OpenAI's Codex Desktop can run your computer now - and has its own browser Want to build a startup that gets acquired? This founder shares 5 proven tips Google to pay $135M settlement to Android phone users - how to claim your share if you qualify Want to stand out on LinkedIn? Try this career strategist's top 3 tips for strengthening your profile I've used Dell's new XPS 16 for a week, and it's the Windows laptop to beat in 2026 You can get 50% off YouTube Premium for 1 year right now - but the deal ends soon Tidal vs. Qobuz: I tried both hi-res streaming services, and they couldn't be more different This stroller turns into a carry on-suitcase, and I recommend it for traveling parents The best small business VoIP providers of 2026: Expert tested and reviewed Protect your devices with our pick for the best antivirus software, now over 60% off MacBook Neo vs. Surface: Why spiraling RAM prices are bruising Microsoft's PC business but not Apple's I tried Google's new desktop app for Windows, and I'll never search the old way again Microsoft's Windows 11 laptop deal for students comes with a $500 bonus - what's included You can buy an LG B5 OLED for $1,500 off at Best Buy - and it comes with a free 4K TV Why Zorin OS 18.1 is simply the best Linux distro - for anyone Why Netgear just got the first FCC router ban exemption in the US Microsoft's latest Windows update now confirms if your PC is Secure Boot-protected - how it works Can this $70 Linux app make up for the lack of Photoshop? I tried it to find out 'Like handing out the blueprint to a bank vault': Why AI led one company to abandon open source iPhone charging slowly? 6 quick fixes to try before blaming your battery Roku TV vs. Fire Stick: Why I'm looking beyond streaming resolution when comparing the two AI is getting better at your job, but you have time to adjust, according to MIT The best internal communication tools of 2026: Expert tested and reviewed Half of all US employees use AI at work now - and waste almost 8 hours a week doing it The latest Google Home update brings Gemini fixes that I'm actually excited to try again I've been subscribed to a data removal service a month now - what I wish I knew sooner You can use Linux 7.0 on these 7 distros today - here's what to expect How I share audio from my Android phone to multiple earbuds (and why it's a big deal) Why the Apple Watch's 20-minute calibration test is worth your time - especially if you're data curious I tested ChatGPT Plus vs. Gemini Pro to see which is better - and if it's worth switching I used the 'Plus Five' rule to fix my iPhone's slow wireless charging - here's how it works The new rules for AI-assisted code in the Linux kernel: What every dev needs to know 'Job seekers have to be detectives': 3 signs that listing is a scam How the latest Netrunner distro delivers a Linux productivity powerhouse This Linux distro offers an easy DNS switcher - but there's more to it that I like I tested Artix Linux: An enjoyable systemd-free distro for experienced users (and ChromeOS speeds) I spent two years testing wind power at home - here's why solar is still my preferred source I camera-tested the Samsung Galaxy S26 Ultra with Oppo and Xiaomi - this model won it for me How I boosted my portable solar panels' power by up to 30% - 11 expert-approved tips I see why Ubuntu 26.04 is more than just a performance bump for thrill-seeking gamers France is ditching Windows for digital sovereignty - and its new Linux stack is taking shape As an Android user, this MagSafe wallet is the clearest reason why Qi2 magnets shouldn't be ignored The best Zoom alternatives in 2026: Expert tested and reviewed 30 years later, I returned to Enlightenment Linux to test the Elive beta - and it's much better Here's my favorite email trick for cleaning up inbox clutter - automatically The $30 Google TV stick may be the budget Chromecast successor we've been waiting for The best AR and MR glasses in 2026: Expert tested and reviewed This handy electric screwdriver is now 50% off - here's where to snag the deal This Ryobi yard essentials bundle packs a free power tool - how to get yours After trying these boomless headphones in the office, I'm feeling hopeful for the future of work tech I used this EcoFlow battery to run my 3,000-sq-ft home in a blackout - here's how it kept my AC on Microsoft's Windows Insider Program is no longer a confusing mess Forget Shokz: I tried the Suunto Spark earbuds for a month, and they've sold me on air conduction iOS 26.4 brings essential upgrades to your iPhone - including a vital security fix YouTube Premium is getting a price increase in June - but you can save $32 with one change Your router may be vulnerable to Russian hackers, FBI warns: 5 steps to take now I walked 3,000 steps with my Apple Watch, Google Pixel, and Oura Ring - this tracker was most accurate I stopped guessing which AA batteries are dead - this charging station keeps them in check for me My favorite Android Auto find is these hidden shortcuts that are highly customizable AirDrop is coming to older Samsung phones - is yours supported? How to get it early I'm no longer using Google Photos as just a cloud storage - 5 tools that elevate the app The best data removal services of 2026: Expert tested and reviewed The best Samsung TVs of 2026: Expert tested and reviewed The best mobile scanning apps of 2026: Expert tested and reviewed The best HP laptops of 2026: Expert tested and reviewed After using Lenovo's new Yoga laptop, I'm wondering if Windows makers are running out of ideas
AI Model Release Tracker: Microsoft AI's first reasoning model arrives
Written by Radhika Rajkumar, Senior EditorSenior Editor June 2, · 2026-05-29 · via Latest news
ai-tracker-1.png
Elyse Betters Picaro/ZDNET

Follow ZDNET: Add us as a preferred source on Google.


AI labs are shipping new models nonstop. Besides being better and faster than their predecessors, every new model isn't guaranteed to be a major step change, despite how the company's PR may wax poetic about them. Model strengths really emerge in context: Where are competitor models lacking or excelling? Which models have outstanding specialties, and which are just catching up to industry standards?  

Also: How we test AI at ZDNET

Our Model Release Tracker helps you make sense of where models stand relative to each other, and whether they're worth a deeper look. While we don't test every model or model update on this list, we'll always include the key elements you need to know, along with our hands-on expert test, where applicable. We also include an Expert Score for certain models. Curious about how we test AI? Check out this breakdown of our process

Here are some of the biggest model releases of 2026 so far and what to know about them. We'll update this list whenever a notable new model arrives. 


MAI-Thinking-1 

Microsoft AI | June 2, 2026

What it does: At its Build developer conference, Microsoft said that this new 35-billion-parameter model is, unsurprisingly, designed for multi-step agentic tasks. It scored similarly on the SWE Bench Pro benchmark test for coding as Anthropic Opus 4.6. The company also noted that enterprise users can trust this model for any use because it was trained only on clean, commercially safe data -- an important tidbit given mounting AI copyright lawsuits. 

Also: Microsoft's first reasoning model is one of 7 AIs just released at Build - what we know so far

Why it matters: This is the first reasoning model from Microsoft AI, a notable milestone for any AI lab, but especially so this late in this race. Most labs have created multiple advanced reasoning models at this point, so it's unclear yet where Microsoft's approach stands relative to those, though the company is focusing on appealing to enterprise clients specifically. 


Claude Opus 4.8 

Anthropic | May 28, 2026

What it does: Replacing Opus 4.7 starting May 28 (at the same price), Opus 4.8 offers faster thinking modes for one-third the cost of the earlier version, according to Anthropic. Like most of Anthropic's models, 4.8 prioritizes coding abilities, scoring higher than 4.7 on two coding benchmarks but not fully besting OpenAI's GPT 5.5. It also "reaches new highs on our measures of prosocial traits like supporting user autonomy and acting in the user's best interest," the company noted in the release, though definitions for what that means remain murky. 

Also: I compared Claude Opus 4.8 with 4.7 in a 10-round honesty test - and a legal prompt broke it

Why it matters: Anthropic has always prioritized model safety and interpretability, but appears to be further emphasizing that standard with this release. The company said Opus 4.7 had a 92% honesty rate, in addition to being less sycophantic and hallucination-prone overall. The fact that it claims 4.8 shows "substantially" lower rates of misalignment than 4.7 indicates an increasingly high standard for model safety, especially because Anthropic compared 4.8's alignment to that of Mythos Preview


Gemini 3.5 Flash

Google | May 19, 2026

What it does: At Google I/O, the company launched its 3.5 model family, designed for building agents (following suit with where all of the AI industry is focused at the moment). The first model available now is Gemini 3.5 Flash, which beat Gemini 3.1 Pro on a few coding and agentic benchmarks and now undergirds AI Mode in Search and the Gemini App as the default model. Like all Flash models, it's optimized for speed and a cheaper, more lightweight user experience, but can still handle "long-horizon" agentic tasks, Google noted. 3.5 Pro should roll out in June. 

Also: Anthropic launches Opus 4.8, with honesty as its killer feature

Why it matters: The most important place 3.5 will operate is in Search. As Google doubles down on an AI-first search experience, the models running it shape the responses people will passively consume every day, meaning it has to be especially up to snuff in terms of accuracy and hallucinations. Notably, though, the System Card for 3.5 Flash lacks any mention of hallucination rate or sycophancy. 


GPT-5.5 Instant  

OpenAI | May 5, 2026

What it does: OpenAI said in its announcement that the lighter version of OpenAI's just-released GPT-5.5 is less verbose than its predecessor, GPT-5.3 Instant. It also touted fewer hallucinations and improved factuality, saying "GPT‑5.5 Instant produced 52.5% fewer hallucinated claims than GPT‑5.3 Instant on high-stakes prompts covering areas like medicine, law, and finance." 

Also: Anthropic's Mythos is evolving faster than expected, reports AI safety agency

Why it matters: GPT-5.5 Instant replaces GPT-5.3 as the default model in ChatGPT. Again, while the expectation is that each new AI model gets more efficient, easier to use,  and makes up less stuff, a significant improvement in hallucinations for a model most people use for fast queries could mean less misinformation spreading among the masses. That's especially critical given how many people are using ChatGPT for everyday health questions, for example. 

(Disclosure: Ziff Davis, ZDNET's parent company, filed an April 2025 lawsuit against OpenAI, alleging it infringed Ziff Davis copyrights in training and operating its AI systems.)


Nemotron 3 Nano Omni 

Nvidia | April 28, 2026

What it does: The latest in Nvidia's open Nemotron family, this model provides agents with multimodal input. That means they can "perceive and reason across visual, audio, and textual inputs within a single shared perception‑to‑action loop," according to Nvidia, thereby unifying multiple capabilities into a single system. 

Also: AI is an arms race, and the US wants $9 billion in Nvidia superchips to keep up

Why it matters: Normally, systems of agents need to use separate models for speech, vision, and text, meaning they jump across documents, video, and audio to complete multi-step tasks. That slows down workflows, undermines the context agents gather, and racks up inference costs. Nvidia's approach, if it works, would streamline this process and reduce token use, saving you money. Try it on Hugging Face


GPT-5.5  

OpenAI | April 23, 2026 

Expert Score: 93/100

What it does: ZDNET tester-in-residence David Gewirtz technically gave GPT-5.5 an A- score, but said it "can be reductively described as better and faster than GPT-5.4," which is hopefully the bare-minimum expectation for a new model. Specifically, though, the model got better at agentic coding, clearly identifying concepts, scientific research, and factual accuracy. 

Also: I put GPT-5.5 through a 10-round test: It scored 93/100, losing points only for exuberance

Why it matters: While the model itself may not be leaps and bounds ahead of its immediate predecessor, the quick turnaround from 5.4 to 5.4 -- less than two months -- indicates how rapidly agentic coding is accelerating OpenAI's model release cycle. As ZDNET's David Gewirtz breaks down, the company, much like other frontier labs using AI to build AI, is shipping updates at an exponentially increasing rate. 


ChatGPT Images 2  

OpenAI | April 23, 2026 

What it does: Soon after sunsetting Sora, its generative video model and social platform, OpenAI somewhat confusingly announced Images 2. ZDNET model tester David Gewirtz got an early look at Images 2 before its release and was impressed. While he didn't give this model a formal Expert Score, he said it's fun, a huge leap, and actually useful for work

Why it matters: OpenAI seemed to be getting out of the more consumer-minded AI product game when it discontinued Sora, having been beaten by Anthropic at securing lucrative enterprise contracts. That OpenAI still came out with Images 2 within that redirection narrative indicates that it sees image generators as relevant enough to enterprise AI -- especially on the heels of Anthropic's Claude Design


Claude Opus 4.7 

Anthropic | April 16, 2026 

What it does: Arriving relatively quickly after Opus 4.6, this model boasts new highs in honesty, reduced sycophancy, and hallucinations. It also appears to have a knack for cybersecurity, as it backs the new Claude Security, released shortly after the model itself -- but no, it's not Mythos, as many suspected. 

Also: Anthropic's new Claude Security tool scans your codebase for flaws - and helps you decide what to fix first

Why it matters: Hallucinations and honesty are among the most difficult, hard-to-solve issues plaguing even the best models. For Anthropic to claim such significant gains in those areas is no small feat for an AI lab that takes safety seriously. 


Claude Mythos (Preview) 

Anthropic | April 7, 2026 

What it does: This is a tough one because Mythos isn't actually available to the public. Anthropic created quite a media storm when it positioned the new general-purpose model as too powerful to release as usual. While the model is apparently a step change from earlier Anthropic models, the company was especially alarmed because of the security threat it posed, stating that "it is strikingly capable at computer security tasks." 

In response to that, Anthropic spearheaded Project Glasswing, a collaborative effort with several rival AI labs, including Google, Nvidia, and Microsoft, as well as security authorities like Palo Alto Networks, "to help secure the world's most critical software, and to prepare the industry for the practices we all will need to adopt to keep ahead of cyberattackers." 

Also: Apple, Google, and Microsoft join Anthropic's Project Glasswing to defend world's most critical software

Why it matters: If we're to believe Anthropic's guidance that Mythos poses a significant threat to the world's software -- so much so that only a select few partners can access it -- cybersecurity apparatuses as they stand may not be prepared to meet the rapidly evolving frontier of model capabilities. Mythos may not be the only model of its caliber, but simply the first of many to come once other labs achieve similar breakthroughs. 

For now, just a few weeks into its release, Mythos is helping catch software bugs in droves. 


GPT-5.4  

OpenAI | March 5, 2026 

What it does: OpenAI framed this new model, released barely three months after GPT-5.2, as specifically designed for professional work. According to the company's own testing (which should always be taken with a grain of salt until verified by a third party), GPT-5.4 matches or outperforms human professionals 83% of the time. 

Why it matters: As AI companies focus more on gaining enterprise trust (and contracts) while lauding what agentic AI can do, they need models that can handle complex work-related tasks with minimal risk, delay, or prohibitively high costs. Any model advancement that shows prowess in professional workflows has a better chance of being taken seriously by companies struggling to adopt AI, though nothing guarantees seamless integration. 

Also: OpenAI's new GPT-5.4 clobbers humans on pro-level work in tests - by 83%


Claude Opus 4.6 

Anthropic | Feb. 5, 2026 

What it does: This model quickly redefined the standard for autonomous agentic work, especially for coding. That's no surprise given Anthropic's authority in building models especially adept at programming tasks. Opus 4.6 also demonstrated improvement in complex, longer-running tasks overall. 

Why it matters: Opus 4.6's ability to handle tasks better on its own means you can reliably offload more of your workflow to it -- something agentic offerings usually struggle with. 

Also: Anthropic says its new Claude Opus 4.6 can nail your work deliverables on the first try


GPT-5.3-Codex  

OpenAI | Feb. 5, 2026 

What it does: This new coding model -- which OpenAI said helped build and debug itself -- can be interrupted and redirected mid-task, which, if true, is a huge boon for developers using it on complex or shifting projects with tons of trial-and-error. GPT-5.3-Codex also boasts run times of over a day and a better grasp on user intent. 

Also: OpenAI's new Spark model codes 15x faster than GPT-5.3-Codex - but there's a catch

Why it matters: OpenAI is trying to catch up to Anthropic's lead in agentic coding (and, coincidentally or not, released 5.3 Codex on the same day as Anthropic launched Opus 4.6). While ZDNET experts often prefer Claude Code to other tools for vibe coding, OpenAI's rumored shift toward enterprise clients and away from fun consumer tools could eventually close that gap. 

Artificial Intelligence