InertiaRSS Track and read blogs, news, and tech you care about
Read Original Open in InertiaRSS

Recommended Feeds

OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
C
CERT Recently Published Vulnerability Notes
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
Latest news
Latest news
CTFtime.org: upcoming CTF events
CTFtime.org: upcoming CTF events
The Hacker News
The Hacker News
Malwarebytes
Malwarebytes
G
GRAHAM CLULEY
P
Privacy International News Feed
Spread Privacy
Spread Privacy
S
Schneier on Security
V
V2EX
V
Vulnerabilities – Threatpost
Project Zero
Project Zero
Cisco Talos Blog
Cisco Talos Blog
T
Threat Research - Cisco Blogs
罗磊的独立博客
B
Blog RSS Feed
Stack Overflow Blog
Stack Overflow Blog
F
Fortinet All Blogs
Recent Announcements
Recent Announcements
S
Securelist
阮一峰的网络日志
阮一峰的网络日志
SecWiki News
SecWiki News
aimingoo的专栏
aimingoo的专栏
宝玉的分享
宝玉的分享
C
Cybersecurity and Infrastructure Security Agency CISA
IT之家
IT之家
Schneier on Security
Schneier on Security
MyScale Blog
MyScale Blog
李成银的技术随笔
Know Your Adversary
Know Your Adversary
人人都是产品经理
人人都是产品经理
I
Intezer
Vercel News
Vercel News
有赞技术团队
有赞技术团队
博客园 - 三生石上(FineUI控件)
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
F
Fox-IT International blog
V
Visual Studio Blog
Simon Willison's Weblog
Simon Willison's Weblog
Cyberwarzone
Cyberwarzone
博客园 - Franky
S
Secure Thoughts
L
LINUX DO - 热门话题
The Cloudflare Blog
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
The Register - Security
The Register - Security
T
Threatpost
博客园 - 司徒正美

阮一峰的网络日志

科技爱好者周刊(第 397 期):财富正在向 AI 集中 科技爱好者周刊(第 397 期):财富正在向 AI 集中 科技爱好者周刊(第 396 期):互联网通信的替代方案 科技爱好者周刊(第 396 期):互联网通信的替代方案 - 阮一峰的网络日志 科技爱好者周刊(第 395 期):软件开发的第三种方式 科技爱好者周刊(第 395 期):软件开发的第三种方式 - 阮一峰的网络日志 科技爱好者周刊(第 394 期):第二次 API 开放浪潮 科技爱好者周刊(第 393 期):脑腐状态 科技爱好者周刊(第 392 期):axios 投毒与好莱坞式骗术 科技爱好者周刊(第 391 期):AI 的贫富分化 科技爱好者周刊(第 390 期):没有语料,大模型就是智障 套壳中国大模型撑起500亿美元估值?扒一扒 Cursor 的"套壳"疑云 科技爱好者周刊(第 389 期):未来如何招聘程序员 科技爱好者周刊(第 388 期):测试是新的护城河 零安装的"云养虾":ArkClaw 使用指南 科技爱好者周刊(第 387 期):你是领先的 科技爱好者周刊(第 386 期):当外卖员接入 AI 字节全家桶 Seed 2.0 + TRAE 玩转 Skill 科技爱好者周刊(第 385 期):马斯克害怕中国车企吗? 智谱旗舰 GLM-5 实测:对比 Opus 4.6 和 GPT-5.3-Codex 科技爱好者周刊(第 384 期):为什么软件股下跌 科技爱好者周刊(第 383 期):你是第几级 AI 编程 Kimi 的一体化,Manus 的分层 科技爱好者周刊(第 382 期):独立软件的黄昏 AI native Workspace 也许是智能体的下一阶段 科技爱好者周刊(第 381 期):中国 AI 大模型领导者在想什么 科技爱好者周刊(第 380 期):为什么人们拥抱"不对称收益" 科技爱好者周刊(第 379 期):《硅谷钢铁侠》摘录 我如何用 AI 处理历史遗留代码:MiniMax M2.1 升级体验 科技爱好者周刊(第 378 期):预测是新的互联网热点 科技爱好者周刊(第 377 期):14万美元的贫困线 科技爱好者周刊(第 376 期):太空数据中心的争议 科技爱好者周刊(第 375 期):一扇门的 Bug 终于有人做了 Subagent,TRAE 国内版 SOLO 模式来了 科技爱好者周刊(第 374 期):6GHz 的问题 VS Code 使用国产大模型 MiniMax M2 教程 科技爱好者周刊(第 373 期):数据模型是新产品的核心 国产大模型接入 Claude Code 教程:以 Doubao-Seed-Code 为例 科技爱好者周刊(第 372 期):软件界面如何设计 科技爱好者周刊(第 371 期):一个乐观主义者的专访 科技爱好者周刊(第 370 期):正确的代码高亮 错误处理:异常好于状态码 科技爱好者周刊(第 369 期):Tim 与罗永浩的对谈 科技爱好者周刊(第 368 期):不要这样管理软件团队 一天之内,智谱和 Anthropic 都发了最强编程模型 科技爱好者周刊(第 367 期):Nano Banana 的几个妙用 科技爱好者周刊(第 366 期):旧金山疯狂的 AI 广告 科技爱好者周刊(第 365 期):流量变现正在崩塌 科技爱好者周刊(第 364 期):最难还原的魔方 科技爱好者周刊(第 363 期):最好懂的神经网络解释
Model Showdown: MiniMax M2 vs GLM 4.6 vs Claude Sonnet 4.5
阮一峰 · 2025-11-04 · via 阮一峰的网络日志

I.

Last month, I wrote an article comparing two large models.

Someone commented that the two models were too few and asked if other models could be added.

Coincidentally, last week (October 27th), MiniMax company released the M2 model, representing the latest level of domestic large models.

I thought it would be a good idea to test its practical performance and compare it with Zhipu's GLM 4.6 and Anthropic's Claude Sonnet 4.5.

After all, they are all part of the most advanced programming large models currently, which are closely related to us developers.

II.

First, let me clarify that I'm not very familiarMiniMax company is relatively low-key.

I only know that this company specializes in developing large models, with products such as text models, video models, audio models, etc., but none of them are particularly popular. I haven't paid special attention.

Last week, while browsing Twitter, I saw some foreigners discussing (123), and that's when I learned that MiniMax has released its new flagship model, M2.

The person speaking above is the head of the HuggingFace large model community, who mentioned that the M2 model ranked fifth in the world and first among open-source models in the Artificial Analysis performance competition. On that day,

it was also ranked first on the HuggingFace hot list.

In the global large model call volume ranking of OpenRouter, it ranked third this week.

I got interested and decided to try it out properly.

Three,

According to MiniMax's description, the M2 model has particularly strong programming capabilities and is one of the best programming models currently available.

As everyone knows, the most popular programming models internationally are now Claude Sonnet 4.5, and the domestic GLM 4.6 model is also very strong, so I put the three of them together for comparison.

For simplicity, I'll just use the official web version (Domestic version,International VersionRun the test on it, and everyone can try it together.

The web version is actually the official AI product.MiniMax AgentThe underlying one uses the M2 model.

Web usage is free, API calls are also now freeFree periodFor two weeks. The pricing afterwards is 1 million tokens input/output at 2.1 yuan/8.4 yuan RMB, officially promoting only 8% of Claude's price.

I'll list its other links as well.Document Repositoryon GitHub, API Call Guide (compatible with OpenAI and Anthropic formats) refer to the official documentation, Model Downloadon HuggingFace, after downloading, you can deploy locally if conditions permit.

4.

My test questions come from the famous programmer Simon Willison, his website has the test results for Cluase Sonnet 4.5.

Previously, I tested GLM 4.6 model from Zhipu company with these questions, everyone can refer to.

This article mainly focuses on the test performance of MiniMax M2.

V.

First question, the test assesses the model's ability to understand and run code.

Clone the code repository https://github.com/simonw/llm , then run the test cases using the following command.

pip install -e '.[test]'
pytest

The prompt above requires the model to clone a Python repository, run the test cases within it, and return the results.

Judging from the web display, it's clear that the Minimax Agent has an integrated sandbox that runs code in an isolated command-line environment (see image below).

The entire process took about three minutes, and then it provided the result: it passed 466 test cases. The result was completely correct.

What impressed me was that, in addition to the execution result, it also provided a coverage analysis (see the image below), indicating which functionalities of the code were covered by the test cases. I haven’t seen any other models proactively provide coverage information.

The complete conversation can be found here .

Six,

The second question tested the most关心的 code generation capability, to see if it could generate an application according to the requirements.

I still used the same repository and asked M2 to add a feature, which required not only modifying the code but also altering the database structure and adding corresponding test cases.

1. The code repository https://github.com/simonw/llm is an AI chat application that stores users' prompts and AI responses in an SQLite database.

2. It currently uses linear collections to save individual conversations and responses. You attempted to add a parentresponseid column to the response table and model the conversation responses as a tree structure through this column.

3. Write new pytest test cases to verify your design.

4. Write a tree_notes.md file, first writing your design into the file and then using it as notes during the process.

This task is quite complex and takes a bit longer to run.

Here's a twist. During the process, it suddenly prompted that reading the GitHub repository failed, and an unexpected scene occurred.

It even automatically switched to the third-party deepwiki.com to fetch the repository. Later, when analyzing the database structure, it switched to datasette.io to analyze the SQLite database. I've never seen automatic switching to third-party cloud services like this before, unfortunately, I didn't get a chance to take a screenshot.

After completing the task, it provided a summary (below), detailing what it did, including modifying the database and adding test cases.

It even added an example file (below) demonstrating how to use the new features, and an example diagram showing the modified dialogue structure, which wasn't required by the prompt.

The complete dialogue can be found here .

Additionally, the official website's gallery has many applications it generated, which I think are also worth checking out.

Section 7,

Question 3 is the "pelican riding a bicycle" scenario invented by Simon Wilson, testing its comprehension and reasoning abilities.

Generate an SVG of a pelican riding a bicycle. (Generate an SVG of a pelican riding a bicycle)

This is a scenario that doesn't exist in reality; it relies entirely on the model's own reasoning. The stronger the comprehension, the more realistic the generated image.

Below is the result it generated. For the full conversation, see here.

For comparison, I've also included the results from the other two models.

GLM 4.6

Claude Sonnet 4.5

I think there are two noteworthy points in the results of MiniMax M2 (first image). First, it has added roads; second, its bicycle structure is relatively more complete, just missing the handlebars. Also, the posture of that pelican would be better if it looked more like "riding a bike."

Eight,

testing is over here. As for the comparison between GLM 4.6 and Claude Sonnet 4.5, everyone can check their respective links and compare themselves.

I must honestly say, MiniMax M2's performance exceeded my expectations.

What attracts me most isn't the running result itself, but the way it handles problems. It's very user-friendly, adding some auxiliary results to help understand, making it feel easy to use (accessible) and easy to understand. This also indirectly enhances the reliability of the generated results.

I tend to believe that the various review results truly reflect the real strength of the M2. Considering its API pricing (still in the free period now), I will use it in my upcoming work and also recommend everyone to give it a try.

(Complete)