InertiaRSS Track and read blogs, news, and tech you care about
Read Original Open in InertiaRSS

Recommended Feeds

博客园 - 司徒正美
V
V2EX
T
Tailwind CSS Blog
有赞技术团队
有赞技术团队
aimingoo的专栏
aimingoo的专栏
Apple Machine Learning Research
Apple Machine Learning Research
IT之家
IT之家
Blog — PlanetScale
Blog — PlanetScale
A
About on SuperTechFans
月光博客
月光博客
T
The Blog of Author Tim Ferriss
宝玉的分享
宝玉的分享
Martin Fowler
Martin Fowler
博客园 - 聂微东
The GitHub Blog
The GitHub Blog
V
Visual Studio Blog
WordPress大学
WordPress大学
酷 壳 – CoolShell
酷 壳 – CoolShell
Engineering at Meta
Engineering at Meta
GbyAI
GbyAI

阮一峰的网络日志

科技爱好者周刊(第 396 期):互联网通信的替代方案 科技爱好者周刊(第 396 期):互联网通信的替代方案 - 阮一峰的网络日志 科技爱好者周刊(第 395 期):软件开发的第三种方式 科技爱好者周刊(第 395 期):软件开发的第三种方式 - 阮一峰的网络日志 科技爱好者周刊(第 393 期):脑腐状态 科技爱好者周刊(第 392 期):axios 投毒与好莱坞式骗术 科技爱好者周刊(第 391 期):AI 的贫富分化 科技爱好者周刊(第 390 期):没有语料,大模型就是智障 套壳中国大模型撑起500亿美元估值?扒一扒 Cursor 的"套壳"疑云 科技爱好者周刊(第 389 期):未来如何招聘程序员 科技爱好者周刊(第 388 期):测试是新的护城河 零安装的"云养虾":ArkClaw 使用指南 科技爱好者周刊(第 387 期):你是领先的 科技爱好者周刊(第 386 期):当外卖员接入 AI 字节全家桶 Seed 2.0 + TRAE 玩转 Skill 科技爱好者周刊(第 385 期):马斯克害怕中国车企吗? 科技爱好者周刊(第 384 期):为什么软件股下跌 科技爱好者周刊(第 383 期):你是第几级 AI 编程 Kimi 的一体化,Manus 的分层 科技爱好者周刊(第 382 期):独立软件的黄昏 AI native Workspace 也许是智能体的下一阶段 科技爱好者周刊(第 381 期):中国 AI 大模型领导者在想什么 科技爱好者周刊(第 380 期):为什么人们拥抱"不对称收益" 科技爱好者周刊(第 379 期):《硅谷钢铁侠》摘录 我如何用 AI 处理历史遗留代码:MiniMax M2.1 升级体验 科技爱好者周刊(第 378 期):预测是新的互联网热点 科技爱好者周刊(第 377 期):14万美元的贫困线 科技爱好者周刊(第 376 期):太空数据中心的争议 科技爱好者周刊(第 375 期):一扇门的 Bug 终于有人做了 Subagent,TRAE 国内版 SOLO 模式来了 科技爱好者周刊(第 374 期):6GHz 的问题 VS Code 使用国产大模型 MiniMax M2 教程 科技爱好者周刊(第 373 期):数据模型是新产品的核心 国产大模型接入 Claude Code 教程:以 Doubao-Seed-Code 为例 科技爱好者周刊(第 372 期):软件界面如何设计 大模型比拼:MiniMax M2 vs GLM 4.6 vs Claude Sonnet 4.5 科技爱好者周刊(第 371 期):一个乐观主义者的专访 科技爱好者周刊(第 370 期):正确的代码高亮 错误处理:异常好于状态码 科技爱好者周刊(第 369 期):Tim 与罗永浩的对谈 科技爱好者周刊(第 368 期):不要这样管理软件团队 一天之内,智谱和 Anthropic 都发了最强编程模型 科技爱好者周刊(第 367 期):Nano Banana 的几个妙用 科技爱好者周刊(第 366 期):旧金山疯狂的 AI 广告 科技爱好者周刊(第 365 期):流量变现正在崩塌 科技爱好者周刊(第 364 期):最难还原的魔方 科技爱好者周刊(第 363 期):最好懂的神经网络解释 科技爱好者周刊(第 362 期):GitHub 工程师谈系统设计 科技爱好者周刊(第 361 期):暗网 Tor 安全吗? 科技爱好者周刊(第 360 期):Dan Wang 的新书
Zhipu's flagship GLM-5 tested: Comparing Opus 4.6 and GPT-5.3-Codex
阮一峰 · 2026-02-12 · via 阮一峰的网络日志

I. Introduction

Just now, I saw that the new flagship model of Zhipu, GLM-5, has officially been released.

They really pushed hard, releasing it right before the long holiday, and it's been less than two months since the release of the previous version, GLM-4.7...

GLM-4.x has received high praise both domestically and internationally, widely recognized as a top-tier model in the programming field. The new major version leaves people curious about what improvements it will bring.

To be honest, last week, the team contacted me to participate in the beta test, and I've been using this model for several days now.

Coincidentally, last week also saw the release of new versions for two flagship models abroad: Anthropic released Claude Opus 4.6, and OpenAI released GPT-5.3-Codex.

These three new models all focus on programming, so I couldn't help but conduct comparative tests to see if there are any differences, and I think this is something many people are interested in.

Below are the results of real programming tasks on these three AI models.

II. Introduction to GLM-5

According to the official release notes, GLM-5 is introduced as follows: As an open-source model, GLM-5 fully competes with top-tier proprietary models , with two specific areas of enhancement.

(1) Complex System Engineering

GLM-5 is not only good at generating front-end web pages but also skilled in handling back-end tasks, system refactoring, and deep debugging, abandoning the "prioritizing front-end aesthetics over low-level logic" model.

It has a strong self-reflection and error-correction mechanism, capable of autonomously analyzing logs, identifying root causes, and iterating fixes until the system runs smoothly.

(2) Long-range Agent

It can handle long-range tasks, i.e., multi-stage, long-step complex tasks, capable of autonomously breaking down requirements, running continuously for hours, and maintaining contextual coherence and goal consistency.

(3) Summary

The tasks GLM-5 can accomplish have gone beyond generating front-end UI, and it can generate system-level large and complex projects, such as operating system kernels, browser engines, V8 engines, etc.

Its slogan is "In the era where large models are entering the Agent and large task phase, GLM-5 is the open-source choice you can use."

III. Testing Methods

The test questions I selected are those used by Alejandro AO, the advocate of HuggingFace, to test Opus 4.6 and GPT 5.3.

He took a video showing the performance of these two models.

I then used the same questions to test GLM-5 and compared the results with his.

There were four questions in total, covering both frontend and backend aspects. I have already created a repository with the original prompts and scripts and uploaded it to GitHub.

Four, Web Design Test

The first test was on web design and reconstruction capabilities.

The original page was very basic.

It just categorized the information and stacked it together. We had the AI redesign the webpage to make it aesthetically pleasing and user-friendly, conveying a mature and reliable professional sense.

As mentioned before, the prompt and the original file are all here.GitHub, not repeated here. Everyone can use it to run themselves, or let other models run it.

Here is the generation result of GLM-5.

This result is both aesthetically pleasing and professional, with all information well-organized and featuring animation effects. Mobile browsing (see below) is also problem-free, making it practically ready for launch.

I've published this page, everyone canClick hereGo and see.

Here is the generation result of Opus 4.6, taken from a video screenshot.

Here is the generation result of GPT-5.3.

These three designs are all usable, but GPT-5.3 has a flaw (the header isn't sticky, it disappears when you scroll down), and it's not as aesthetically pleasing as the other two.

So, in this test, GLM-5 and Opus 4.6 perform better, and which one is superior depends on the user's aesthetic preferences. Personally, I prefer the design style of GLM-5.

V. 3D Sandbox Test

The second test evaluates the AI model's 3D animation generation capabilities.

The requirement is to generate an educational web 3D sandbox that demonstrates the motion of celestial bodies in the solar system through animation, and allows adjusting animation parameters such as mass, position, and speed, as well as manually adding new celestial bodies.

Below is the generation result of GLM-5.

On the right side of the page is the animation area, which by default shows three small planets orbiting a central star. It can be rotated 360 degrees with the mouse, as well as zoomed in and out.

The left side of the page is the control panel, which is quite good.

The upper part can adjust animation and celestial parameters, while the lower part is used to add new celestial bodies or remove existing ones.

For comparison, the generation result of Opus 4.6.

The generation result of GPT-5.3.

These three generated results all meet the requirements and can run smoothly. However, the animation of GLM-5 lacks the gravity grid lines, while the grid lines of GPT-5.3 are too messy. Therefore, Opus 4.6 performs better in terms of animation effects.

In terms of the control panel, both GLM-5 and Opus 4.6 are well-designed, while GPT-5.3 is a bit simple.

Overall, I feel that the best performer in this round is Opus 4.6, followed by GLM-5, and finally Codex 5.3.

VI. Web Games

The third test was to generate a web game "Angry Birds."

GLM-5's generation result is decent, quite similar to the original, playable, but lacks gamefulness, and the bouncing effect is not good enough.

Opus 4.6 has a high degree of restoration, and the gaming experience is close to the original.

GPT-5.3's generation result is embarrassing; the birds cannot bounce at all, and the game is unplayable.

Clearly in this round, Opus 4.6 is the best, followed by GLM-5.

Seven, Laravel to Next.js

The last test was to convert a web application based on the PHP language Laravel framework to the JavaScript language Next.js framework.

GLM-5 handled it without any issues, quickly converting PHP language to JS language and providing the converted code structure.

It also automatically installed the required software packages after conversion, completed the script compilation, and prompted the user: "Just integrate the external API, and one click execution npm run dev will allow direct running."

Following its instructions, the execution went smoothly without errors, and I could access the application by opening localhost:3000.

It's an application for checking city weather. Since there was no requirement to change the style, it looks exactly the same as the original PHP version.

The input box in the upper right corner allows you to search for cities.

In the search results, select the city you want.

Clicking into it takes you to the city's detail page, which includes weather, sunrise and sunset times, air quality, maps, and other information.

Opus 4.6 and GPT-5.3 also generated the same results, as the pages and functions are identical, so screenshots are not displayed.

It's worth mentioning that the conversion times for both GLM-5 and GPT-5.3 are around 5 minutes, while Opus 4.6 seems to have encountered some issues, taking a full 20 minutes.

Looking at the results of this round, all three models perform well, but GLM-5 has a shorter generation time, no errors, and a good overall user experience, so I'm voting for it.

Section 8: Summary

After these tests, GLM-5's programming performance is commendable and impressive, capable of standing alongside the latest flagship models from international companies. In some aspects, it even outperforms them, and where it falls short, it's often due to minor details rather than significant differences.

It's said that both training and running processes use the domestic "WanKa Cluster." It can be imagined that with more cards and computing power, its performance would be even better, enough to compete head-to-head with the top-tier large model companies in the world.

Additionally, the two areas it specifically strengthened this time—"complex systems" and "long-term tasks"—are noticeable.

The system logic and backend code it generates have good reliability, with few errors either during generation or runtime. The gaps are often missing features, which can be supplemented by AI later, not architectural issues. Also, I have a personal task that ran for a solid two hours and completed without going off track.

I’d like to end with an official statement.

By 2026, programming large models are evolving from "able to write code" to "able to build systems," and GLM-5 is hailed as the "system architect model" in the open-source community. Shifting focus from "front-end aesthetics" to "Agentic depth/engineering capabilities" makes Opus 4.6 and GPT-5.3 the domestic open-source alternatives.

(End)