惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

P
Privacy International News Feed
T
The Blog of Author Tim Ferriss
Microsoft Azure Blog
Microsoft Azure Blog
Blog — PlanetScale
Blog — PlanetScale
Recorded Future
Recorded Future
爱范儿
爱范儿
D
Docker
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
AWS News Blog
AWS News Blog
T
Threatpost
博客园 - 叶小钗
Recent Announcements
Recent Announcements
C
Check Point Blog
H
Heimdal Security Blog
量子位
G
GRAHAM CLULEY
D
Darknet – Hacking Tools, Hacker News & Cyber Security
Cyberwarzone
Cyberwarzone
Engineering at Meta
Engineering at Meta
L
Lohrmann on Cybersecurity
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
C
Cybersecurity and Infrastructure Security Agency CISA
The Last Watchdog
The Last Watchdog
B
Blog
T
Tor Project blog
A
About on SuperTechFans
博客园 - 三生石上(FineUI控件)
S
Secure Thoughts
T
Tenable Blog
aimingoo的专栏
aimingoo的专栏
P
Palo Alto Networks Blog
Vercel News
Vercel News
V
Visual Studio Blog
The Register - Security
The Register - Security
NISL@THU
NISL@THU
Spread Privacy
Spread Privacy
GbyAI
GbyAI
N
Netflix TechBlog - Medium
MyScale Blog
MyScale Blog
T
Troy Hunt's Blog
雷峰网
雷峰网
Security Latest
Security Latest
L
LINUX DO - 最新话题
cs.AI updates on arXiv.org
cs.AI updates on arXiv.org
Scott Helme
Scott Helme
S
Schneier on Security
cs.CV updates on arXiv.org
cs.CV updates on arXiv.org
C
CERT Recently Published Vulnerability Notes
美团技术团队

OfficeChai

These Are The 10 Cheapest AI Models In The World [June 2026] 18 Best AI Tools For English Speaking (With Examples) [2026] AI Impact? Vacancy Rates For US Office Properties Are Now Highest Since The 2008 Crisis KPMG Pulls Report Praising AI After It Was Found To Have Fake AI-Generated Citations India's Sarvam Raises $234 Million At $1.5 Billion Valuation After SpaceX Stock Pops 20%, Musk Has Made More Money In The Last 24 Hours Than Warren Buffett Made In His Entire Career OfficeChai Nobody Is Using AI Better Than Meta: NVIDIA CEO Jensen Huang 21 Best AI Tools For Animation (With Examples) [2026] 22 Best AI Tools For Architecture (With Examples) [2026] Datacenter Construction Spending Has Eclipsed Public Transportation Spending In The US China Scraps 12,000 Degree Courses, Mainly In Arts And Humanities, To Prepare For AI Age OfficeChai There Is No Job Loss With AI: David Friedberg Loop Between Human Capital And "Token Capital" Will Be The New IP For Firms, Says Satya Nadella How to Reduce Dependency on Key Employees 8 Google Index Checker Use Cases Beyond New Blog Posts Memory Squeeze? Smartphone Purchases Are Down Globally 21 Best AI Tools For Accounting (With Examples) [2026] AI For Voice Generation: 22 Best Options (With Examples) [2026] These Are The Most Popular Image Generation Models On OpenRouter [June 2026] Search Traffic For Websites Is Down 25% Over The Last Year Because Of AI: a16z Data Agentic Coding Has Led To A 50% Increase In Number Of Apps, But Most Are Finding Very Few Users: SimilarWeb Data OpenRouter Launches Fusion API, Which Uses A Combination Of Models To Achieve Fable-Like Performance At Half The Price Dario Amodei Refused To De-Deploy Or Fix Vulnerabilities In Fable Before US Export Controls, Says David Sacks 23 Best AI Tools For Notes Making (With Examples) [2026] 16 Best AI Tools For Astrology (With Examples) [2026] How Jensen Huang Once Had To Ask SEGA's CEO To Pay NVIDIA For A Technology That Didn't Work ChatGPT Already Has 11% Of The Search Market: OpenAI CFO Sarah Friar SpaceX Has Now Launched More Satellites Than Rest Of Humanity Combined Across History Globalization Is Dead, Time For India To Wake Up Says Sridhar Vembu After US Bans Anthropic Mythos And Fable Models For Foreign Users Elon Musk Becomes World's First Trillionaire After Record SpaceX IPO Anthropic Suspends Access To Mythos And Fable Models Following US Govt Directive Against Foreign Users 27 Best AI Tools For Market Research (With Examples) [2026] Why Jeff Bezos Makes Important Decisions Early in The Morning Education And Healthcare IT Have Been The Hardest Areas To Invest In: Peter Thiel Giving AI Long-Term Goals Could Lead To The Emergence Of Self-Preservation: Geoffrey Hinton Your Startup Doesn't Have a Hardware Problem. It Has an Accountability Problem Cyber Incidents Rarely Start With a Hacker: The Weak Links Businesses Overlook What Makes an App Worth Returning to Every Day? 21 Best AI Tools For Lead Generation (With Examples) [2026] How NBA Player Shaquille O'Neal Became An Early Investor In Ring AI For Kids Learning: 22 Best Options (With Examples) [2026] These Are The Most Popular AI Model Companies On OpenRouter [June 2026] Advanced Fintech and NeoBank Software Development Solutions: Building the Digital Banks of Tomorrow TRON Payments: Integrating AML Checks Into Business Workflows 18 Best AI Tools For Resume (With Examples) [2026] 16 Best AI Tools For UI Design (With Examples) [2026] These Are Top 10 Countries Generating The Most Internet Traffic How to Choose the Best Magento Agency for Your Store These Are The Best AI Models For Creative Writing [June 2026] AI For Managers: 28 Best Tools (With Examples) [2026] 17 AI Tools For Trading (With Examples) [2026] AI Has Led To An Explosion Of New Apps, But Nearly None Have Managed To Garner Significant Usage Cloudflare CEO Matthew Prince Says Vinod Khosla Asked Him To Fire His Co-founders For Him To Invest In His Company Australia’s AirTrunk To Invest $30 Billion To Develop Datacenters In India Anthropic Says That Their Employees Are Using AI To Write 8x More Code Compared To 18 Months Ago Anthropic Is Extremely Expensive, Many Are Urgently Looking For Alternatives: Microsoft AI CEO Mustafa Suleyman Sergey Tokarev on creating DIY “Beehives” and a free guidebook AI Crypto Price Prediction: How Accurate Are Machine Learning Models? Why Anthropic Could Find It Hard To Maintain Its $965 Billion Valuation Startup CEO Says They're Saving "Millions Of Dollars" By Replacing Anthropic Models With DeepSeek Ola Cabs' Valuation Falls 99% From Peak, Now Valued At Just $70 Million By Vanguard After TCS Case, Former Wipro Employee Alleges Attempt At Religious Conversion By Coworkers Bot Traffic Has Surpassed Human Traffic On The Internet For The First Time In History, Clouflare Says ChatGPT's Free Users Do 7 Queries Per Day, Those On $20 Plan Do 3x More: CFO Sarah Friar How Keith Rabois Had Been "Highly Skeptical" In 2023 That Anthropic Would Be Worth More Than $5 Billion In 10 Years How to Install AdGuard Home with Docker Step by Step We're Running Out Of Training Data, But Not Too Worried Because There Are Alternate Approaches: Google's Jeff Dean JioHotstar Is Hiring For 75 AI Roles Amid AI Content Push NVIDIA's Nemotron 3 Becomes Most Intelligent Open Weights Model From The US Hackers Allegedly Fooled Meta's AI To Take Over Accounts By Simply Asking It To Change User Emails Manchester Super Giants' AI Promotional Video Gets Panned As "Slop" For Glaring Cricketing Errors AI Reducing Jobs Is "Complete Nonsense": NVIDIA CEO Jensen Huang MiniMax Releases MiniMax M3, Is Competitive With Frontier Models On Many Benchmarks IIT Delhi-Incubated BotLab Dynamics Lights Up Skies With Lord Shiva Themed Drone Show During IPL Final NVIDIA Introduces RTX Spark, A New Chip Optimized For AI Agents For Windows Laptops And PCs NVIDIA Introduces Vera, A New CPU Chip For AI Agents That Is 80% Faster Than x86 CPUs OpenAI's Codex Reaches 5 Million Users, Resets Rate Limits For Users Key Factors That Influence Personal Loan Approval in India AI Is Allowing Me To Experiment And Try Crazier Things: Mathematician Terrance Tao Efficiency Of Human Learning Is Still A Thousand Times Better Than LLM Learning, Need Algorithmic Advances To Improve It: Jeff Dean San Francisco Home's Zillow Listing Says It'll Accept OpenAI Or Anthropic Stock As Payment Open-Source Models Currently Lag Proprietary Models By Just 4 Months: Epoch AI Self-Improvement Possible In AI Models Within A Year, Say Google's Top AI Leaders Digital Minds: Preparing for a Moral Challenge Before It Arrives Nearly 30% Of US-Based Y-Combinator Founders Are Of Indian Origin: SF Chronicle Data "A New Era Of PC": NVIDIA, Microsoft Windows Tease New Collaboration At Least 146,000 AI Hallucinated Citations In Papers Published In 2025, Finds Paper AI Doesn't Undergo Experiences, Has No Moral Conscience: Pope Leo XIV Claude Opus 4.8 Tops Artificial Analysis Intelligence Index, Edges Out GPT 5.5 With Score Of 61.4 Anthropic Says Its Annual Revenue Run-rate Has Now Touched $47 Billion Anthropic Raises $65 Billion At $965 Billion Valuation, Is Now Worth More Than OpenAI Claude Opus 4.8 Is Better Than Opus 4.7 But Not As Good As Mythos Preview, Says Anthropic Claude Opus 4.8 Beats GPT 5.5 On GDPval-AA Benchmark For Real World Tasks Anthropic Releases Claude Opus 4.8, Beats Opus 4.7, GPT-5.5 On Many Benchmarks GTM for Tech Startups Explained How to Use an AI Picture Generator to Create Professional Images Anthropic Is Now Generating 35% More Revenue Than OpenAI: The Information SK Hynix, Micron Join $1 Trillion Club Following AI-Led Memory Shortages
Snowflake CEO Sridhar Ramaswamy Shares Detailed Post Comparing Opus 4.7 And GLM 5.2
OfficeChai Team · 2026-06-24 · via OfficeChai

Z.ai’s GLM 5.2 has gone viral in recent days, but a top CEO has shared how the model compares to another on on the frontier.

Snowflake CEO Sridhar Ramaswamy has posted a detailed breakdown comparing Z.ai’s GLM-5.2 with Anthropic’s Claude Opus 4.7 on dbt-bench, a benchmark designed to evaluate AI models on data transformation and analytics engineering tasks. The findings suggest that while the two models end up with nearly identical overall success rates, they get there in very different ways.

The analysis came from Snowflake’s Coco team, which ran 103 dbt tasks with three trials each on both models. The headline numbers show an almost dead heat. GLM-5.2 achieved a Pass@3 score of 66 percent, while Opus 4.7 came in at 67 percent. At the first-attempt level, however, Opus held a clearer lead, scoring 53.7 percent on Pass@1 compared to GLM’s 47.6 percent.

The results are noteworthy because GLM-5.2 has generated significant interest in recent weeks for delivering strong performance as an open model. Earlier this year, China’s GLM family had already begun climbing coding leaderboards, with GLM-5.1 becoming one of the highest-ranked open models on Code Arena.

According to Ramaswamy, one of the biggest differences between the models lies in how they approach tasks. GLM takes considerably more turns to complete work, averaging 99 turns compared to 80 for Opus. It also makes more execution-related tool calls, averaging 40 per trial against Opus’s 29.

That difference translates into token consumption. Across the benchmark run, GLM used 860 million billing tokens compared to Opus’s 439 million. Snowflake’s team attributed this to a combination of more conversational turns, more atomic API calls, and lower prompt-cache reuse rates.

The popular perception that GLM verifies its work more thoroughly was only partially supported by the data. The study found that GLM performs validation differently rather than necessarily performing more meaningful validation. It often executes individual SQL checks one at a time, while Opus bundles similar checks together. Both models end up covering similar ground, but their workflows look very different under the hood.

The findings also challenge another common assumption: that heavier verification automatically leads to better outputs. Despite GLM’s tendency to perform more checks, Opus still held a six-percentage-point advantage on Pass@1. As Ramaswamy put it, “more verification ≠ more correct.”

The area where GLM appeared to have a distinct advantage was cross-platform validation. The benchmark required solutions to work on both DuckDB and Snowflake. Snowflake’s team found that GLM was more consistent in validating against both targets, which explained several tasks that GLM solved successfully while Opus did not.

The post also highlighted two recurring failure modes. In some cases, GLM gave up too early when it couldn’t infer a solution path from available information. In one task cited by the team, the model performed five file reads across 22 turns but never attempted a write operation before stopping.

The opposite problem appeared in other tasks. One example saw GLM make 411 tool calls over 24 minutes while exhaustively checking row counts, distributions, null values, column types and platform parity. The task still failed in all three attempts. Opus completed the same task with 49 calls in nine minutes.

Interestingly, the “GLM uses twice as many calls” narrative turned out to be somewhat misleading. On tasks that both models solved successfully, GLM used only around 17 percent more calls. The large gap emerged primarily from difficult edge cases where the model entered lengthy verification loops.

The conclusion from Snowflake’s analysis was nuanced. Verification volume by itself was not a reliable predictor of success. Several of GLM’s worst failures came from spending enormous effort validating the wrong aspects of a task, while another category of failures stemmed from abandoning tasks prematurely.

Even so, Ramaswamy sounded optimistic about the model’s future. He said Snowflake was “super excited” about what GLM-5.2 represents and was looking forward to tuning Coco’s evaluation harness further and making the model available to customers.

The post offers a rare look at how frontier models behave beyond benchmark leaderboards. While aggregate scores often dominate discussion, Snowflake’s analysis shows that the path a model takes to reach those scores can reveal just as much about its strengths and weaknesses.