InertiaRSS Track and read blogs, news, and tech you care about
Read Original Open in InertiaRSS

Recommended Feeds

博客园 - 司徒正美
V
V2EX
T
Tailwind CSS Blog
有赞技术团队
有赞技术团队
aimingoo的专栏
aimingoo的专栏
Apple Machine Learning Research
Apple Machine Learning Research
IT之家
IT之家
Blog — PlanetScale
Blog — PlanetScale
A
About on SuperTechFans
月光博客
月光博客
T
The Blog of Author Tim Ferriss
宝玉的分享
宝玉的分享
Martin Fowler
Martin Fowler
博客园 - 聂微东
The GitHub Blog
The GitHub Blog
V
Visual Studio Blog
WordPress大学
WordPress大学
酷 壳 – CoolShell
酷 壳 – CoolShell
Engineering at Meta
Engineering at Meta
GbyAI
GbyAI

阮一峰的网络日志

科技爱好者周刊(第 396 期):互联网通信的替代方案 科技爱好者周刊(第 396 期):互联网通信的替代方案 - 阮一峰的网络日志 科技爱好者周刊(第 395 期):软件开发的第三种方式 科技爱好者周刊(第 395 期):软件开发的第三种方式 - 阮一峰的网络日志 科技爱好者周刊(第 393 期):脑腐状态 科技爱好者周刊(第 392 期):axios 投毒与好莱坞式骗术 科技爱好者周刊(第 391 期):AI 的贫富分化 科技爱好者周刊(第 390 期):没有语料,大模型就是智障 套壳中国大模型撑起500亿美元估值?扒一扒 Cursor 的"套壳"疑云 科技爱好者周刊(第 389 期):未来如何招聘程序员 科技爱好者周刊(第 388 期):测试是新的护城河 零安装的"云养虾":ArkClaw 使用指南 科技爱好者周刊(第 387 期):你是领先的 科技爱好者周刊(第 386 期):当外卖员接入 AI 字节全家桶 Seed 2.0 + TRAE 玩转 Skill 科技爱好者周刊(第 385 期):马斯克害怕中国车企吗? 智谱旗舰 GLM-5 实测:对比 Opus 4.6 和 GPT-5.3-Codex 科技爱好者周刊(第 384 期):为什么软件股下跌 科技爱好者周刊(第 383 期):你是第几级 AI 编程 Kimi 的一体化,Manus 的分层 科技爱好者周刊(第 382 期):独立软件的黄昏 AI native Workspace 也许是智能体的下一阶段 科技爱好者周刊(第 381 期):中国 AI 大模型领导者在想什么 科技爱好者周刊(第 380 期):为什么人们拥抱"不对称收益" 科技爱好者周刊(第 379 期):《硅谷钢铁侠》摘录 我如何用 AI 处理历史遗留代码:MiniMax M2.1 升级体验 科技爱好者周刊(第 378 期):预测是新的互联网热点 科技爱好者周刊(第 377 期):14万美元的贫困线 科技爱好者周刊(第 376 期):太空数据中心的争议 科技爱好者周刊(第 375 期):一扇门的 Bug 终于有人做了 Subagent,TRAE 国内版 SOLO 模式来了 科技爱好者周刊(第 374 期):6GHz 的问题 VS Code 使用国产大模型 MiniMax M2 教程 科技爱好者周刊(第 373 期):数据模型是新产品的核心 国产大模型接入 Claude Code 教程:以 Doubao-Seed-Code 为例 科技爱好者周刊(第 372 期):软件界面如何设计 大模型比拼:MiniMax M2 vs GLM 4.6 vs Claude Sonnet 4.5 科技爱好者周刊(第 371 期):一个乐观主义者的专访 科技爱好者周刊(第 370 期):正确的代码高亮 错误处理:异常好于状态码 科技爱好者周刊(第 369 期):Tim 与罗永浩的对谈 科技爱好者周刊(第 368 期):不要这样管理软件团队 科技爱好者周刊(第 367 期):Nano Banana 的几个妙用 科技爱好者周刊(第 366 期):旧金山疯狂的 AI 广告 科技爱好者周刊(第 365 期):流量变现正在崩塌 科技爱好者周刊(第 364 期):最难还原的魔方 科技爱好者周刊(第 363 期):最好懂的神经网络解释 科技爱好者周刊(第 362 期):GitHub 工程师谈系统设计 科技爱好者周刊(第 361 期):暗网 Tor 安全吗? 科技爱好者周刊(第 360 期):Dan Wang 的新书
Within a day, Zhipu and Anthropic both released their strongest programming models
阮一峰 · 2025-10-01 · via 阮一峰的网络日志

1,

the last day before the holiday (September 30th), it was bustling with activity.

In the morning, Anthropic announced the Claude Sonnet 4.5 model .

In the afternoon, Zhipu AI released the GLM 4.6 model .

I think, for programmers, this development is significant.

Because both models are among the most advanced AI programming models currently available .

If you want AI to generate code, these are the first choices. __JHSNS_SEG_f361eb24_12__ This means that in just one day, AI programming models have reached a new level.

2、

Anthropic's first announcement statement didn't hesitate to use three "world's bests."

"Claude Sonnet 4.5 is the world's best coding model. It is the most powerful model for building complex agents. It is the best model for using computers. It shows significant progress in reasoning and mathematics."

Zhipu's announcement was no less bold.

"We have once again broken through the boundaries of large model capabilities.

GLM-4.6 is our strongest code Coding model (an increase of 27% over GLM-4.5). It achieves comprehensive improvements in real programming, long-context processing, reasoning ability, information search, writing ability, and agent applications."

To convince people, Zhipu's announcement also provides detailed test results.

The above figure shows the results of 8 test benchmarks. Each blue bar represents GLM-4.6, and each green bar represents GLM-4.5. The control group includes the newly released DeepSeek V3.2 Exp, Claude sonnet 4, and Claude sonnet 4.5.

It can be seen that the blue bars are mostly in the top ranks, even first. Zhipu also claims that GLM-4.6 is very cost-effective in terms of tokens (i.e., saving money), "saving more than 30% compared to GLM-4.5, with the lowest cost among similar models."

Therefore, its conclusion is: "GLM-4.6 aligns with Claude Sonnet 4/Claude Sonnet 4.5 in some rankings, stably ranking first among domestic models.""

This is interesting, one claiming to be the 'best coding model in the world,' and the other claiming to 'stably rank first among domestic models.'

Below, I will test how GLM-4.6 compares to Claude sonnet 4.5.

3、

It should be noted that the comparison of these two models is not just for testing, but also has practical significance.

Although Anthropic has strong products, it restricts Chinese users from using them, and domestic users cannot access its services through normal channels. On the other hand, it is a paid model, and the price is not cheap, with input and output costs for one million tokens being $3/15.

In contrast, GLM-4.6 is a completely domestic model from Beijing Zhipu Company. It adopts a thorough open-source approach (MIT License), the model code is fully open, and can be used freely.

You can also install it on your own at home. However, its hardware requirements are too high, and home devices cannot meet them, so it is generally used as a cloud service.

Currently, ZhiPu's official website (BigModelAndZ.ai), using GLM-4.6 via the web interface is free.

Its API calls require payment, and the starter package (coding plan) seems to be 20 yuan RMB per month.

Additionally, it has comprehensive Chinese support (documentation+customer service), which Anthropic also lacks.

In short, my test purpose is also to see if it is truly as powerful as the official claims and whether it can replace the Claude Sonnet model.

4,

My testing method is simple. Anthropic事先邀请了著名程序员Simon Willison来试用Claude Sonnet 4.5模型。

Simon Willison已经在他的网站上公布了试用结果

我就拿他的几个测试,用在GLM-4.6上面,然后比较一下运行结果就可以了。

大家可以跟着一起做,打开官网,把题目粘贴进去(最好贴英文),这样会有更深切的感受。

AI终端工具(比如Claude Code、Cline、OpenCode、Crush等)也可以用,参考官方文档 to be configured (API needs to be enabled first).

5,

the first test.

to pull the code repository https://github.com/simonw/llm , then run the test cases using the following command.

pip install -e '.[test]'

pytest

This test requires an internet connection to fetch the code and runs in the background.

The Web interface on Zhipu's official website, like Claude, provides Python and Node.js server sandbox environments, where code can be generated and executed directly.

I have omitted the intermediate reasoning steps; the final result is shown in the figure below (see the complete conversation on the official website ).

278 test cases passed, took 18.31s

The entire running process (pulling, installing dependencies, executing commands) is the same as Claude Sonnet. Strangely, Claude Sonnet ran 466 test cases, over 100 more than expected. Don't know why.

6、

The second test is a more complex programming task. The original prompt was in English, and I translated it into Chinese.

1. Code repository https://github.com/simonw/llm is an AI chat application that stores user prompts and AI responses in an SQLite database.

2. It currently uses a linear collection to store individual conversations and responses. You tried adding a parentresponseid column and modeled the conversation responses as a tree structure through this column.

3. Write new pytest test cases to verify your design.

4. Write a tree_notes.md file, first write your design into the file, and then use it as a notebook during the process.

Everyone can check the completeConversation history.

GLM-4.6 ran for a few minutes, continuously outputting generated code. In the end, it modified the script, added API and command-line interface calls, and wrote and ran test cases that passed.

It also generated a tree_notes.md file, which contains detailed instructions for this modification.

Everyone can compare its running results withThe running result of Claude Sonnet.

In terms of results, there is not much difference between them; they both meet the requirements of the prompt and the code is all runnable. The differences mainly lie in the implementation details, which require a detailed reading of the code.

7、

The third test is exclusive to Simon Wilson, which is to have AI generate an SVG image of a pelican riding a bicycle (Generate an SVG of a pelican riding a bicycle).

This is a scene that does not exist in reality and lacks reference points, testing the model's imagination and generation capabilities.

Below is the image generated by GLM-4.6 with deep thinking enabled . .

Below is the image generated by Claude sonnet 4.5 with deep thinking enabled.

The results of the two are quite similar, just that Claude's generated beak is more prominent, making it easier to identify as a heron.

8,

Testing is over here. I think, to sum up, GLM-4.6 is a very strong domestic model, with excellent coding capabilities, and can be considered a substitute for the currently recognized strongest model, Claude Sonnet.

It is comprehensive in functionality, capable of handling tasks other than coding, and has a fast response speed, low price , and very high cost-performance.

(End)