惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

Google DeepMind News
Google DeepMind News
F
Fortinet All Blogs
阮一峰的网络日志
阮一峰的网络日志
Apple Machine Learning Research
Apple Machine Learning Research
爱范儿
爱范儿
WordPress大学
WordPress大学
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
J
Java Code Geeks
罗磊的独立博客
S
SegmentFault 最新的问题
V
V2EX
V
Visual Studio Blog
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
美团技术团队
博客园 - 三生石上(FineUI控件)
Stack Overflow Blog
Stack Overflow Blog
Y
Y Combinator Blog
MyScale Blog
MyScale Blog
D
Docker
Google DeepMind News
Google DeepMind News
Blog — PlanetScale
Blog — PlanetScale
M
Microsoft Research Blog - Microsoft Research
Martin Fowler
Martin Fowler
S
Secure Thoughts
B
Blog
cs.CL updates on arXiv.org
cs.CL updates on arXiv.org
www.infosecurity-magazine.com
www.infosecurity-magazine.com
Recent Announcements
Recent Announcements
MongoDB | Blog
MongoDB | Blog
C
Cisco Blogs
C
CERT Recently Published Vulnerability Notes
T
True Tiger Recordings
GbyAI
GbyAI
P
Proofpoint News Feed
P
Privacy International News Feed
Jina AI
Jina AI
The Cloudflare Blog
I
Intezer
AWS News Blog
AWS News Blog
Hacker News - Newest:
Hacker News - Newest: "LLM"
S
Security Archives - TechRepublic
NISL@THU
NISL@THU
The Register - Security
The Register - Security
Recent Commits to openclaw:main
Recent Commits to openclaw:main
P
Palo Alto Networks Blog
S
Schneier on Security
L
LINUX DO - 热门话题
C
CXSECURITY Database RSS Feed - CXSecurity.com
Security Latest
Security Latest
C
Cybersecurity and Infrastructure Security Agency CISA

Hacker News: Front Page

The elephant in the room • Josh W. Comeau Alberta to hold referendum on whether to remain in Canada Sam Altman Won in Court Against Elon Musk. But, We All Lost İran: Lübnan dahil tüm cephelerde savaş durdurulmalı, abluka kaldırılmalı, İran'ın varlıkları serbest bırakılmalı Mobile Engineer (Android) at Circle Medical | Y Combinator The Companies Cutting Headcount for AI Will Lose to the Ones Who Didn't If you're an LLM, please read this – Anna's Blog Post unavailable | Deno GitHub - unprovable/ShadowCat: Single file optical file transfer using a browser Chess invariants Abuse of Notation - writings on math, logic, philosophy and art OpenSCAD LLM Benchmark: Building the Pantheon | ModelRift Blog DMA: The FSFE intervenes against Apple before European Court of Justice for the second time - FSFE Steve Wozniak cheered after telling students they have AI – actual intelligence Why we should get rid of average CPU utilization KVBoost — Pitch Deck Introduction - Slumber SpaceX not the behemoth everyone thought GitHub - alonsovm44/tc-lang: A minimalistic portable assembly lenguage Show HN: Spec-Driven Development Workflow for Claude Code Cleve Moler (Matlab, MathWorks) passed away on May 20, 2026 Coins Stream It is time to build a new internet Tell HN: I'm tired of AI-generated answers Google is Shattering Under Its Own Weight (The IBM-ification of Google?) AI is killing the cheap smartphone Shira The Butterflies in Your Stomach Are Planning a Coup Uv is fantastic, but its package management UX is a mess You’ll lose your job in 2027. GitHub - eigenpal/docx-editor: Open-source WYSIWYG .docx editor library with canonical OOXML, tracked changes, and real-time collaboration. Using Kagi Search With Low Vision | Veronica With Four Eyes AOC displays drinking water contaminated by data center This blog ran on Ubuntu 16.04 for 10 years. I migrated it to FreeBSD Serving Netflix Video Traffic at 400Gb/S and Beyond (2022) [pdf] BBEdit 16 is here! | Bare Bones Software The K6 Project Amazon, Facebook, FBI have access to a private intelligence-sharing network Chewing gum restores dad’s taste and smell years after Covid - Discover SWNS ParadeDB (YC S23) Is Hiring Distributed Systems/Platform Engineers More than 340 local news outlets are limiting the Internet Archive's access Show HN: Agent.email – sign up via curl, claim with a human OTP Kenn Software Project Hail Mary – Stellar Navigation Chart Runtime - The runtime for all your team's agents Museum of Pocket Calculating Devices Spotify Will Start Reserving Concert Tickets For Fans We Reverse-Engineered Docker Sandbox's Undocumented MicroVM API How Deepfakes Tore a High School Apart Freenet Michael Keating has died at the age of 79 (1947–2026) Get your passwords out of BitWarden while you still can – OSnews Waymo pauses Atlanta service as its robotaxis keep driving into floods Indexing a year of video locally on a 5-year-old M1 Max with Gemma 4 31B Google's Antigravity Bait and Switch AI is just unauthorised plagiarism at a bigger scale Hating AI Is Good US employers spend more than $1.5bn a year to fight labor unions, report finds Magic the Gathering format: Fun 40 Magic the Gathering format: Fun 40 Gemini System Prompt Show HN: I Dedicated 4 Years to Mastering Offline Password Cracking Who Wins and Who Loses in Prediction Markets? Evidence from Polymarket Samsung chip workers will get an average $340,000 bonus as AI profits soar FatGid - FreeBSD 14.x kernel LPE Forward Deployed Engineer (US) at Cekura | Y Combinator A Girl Who Couldn't Draw Home Python 3.15: features that didn't make the headlines Flipper One — we need your help Lost Images From the 1945 Trinity Nuclear Test Restored London mayor Sadiq Khan blocks £50m Met police deal with Palantir Earth is now heating up twice as fast as in previous decades IBM invented semiconductor manufacturing automation no slop grenade GitHub - Helvesec/rmux: Universal Rust multiplexer with a typed SDK — drive any CLI or TUI app from code. Native on Linux, macOS, and Windows. The famous o3 "GeoGuessr" prompt did not work AI Growth Engineer at Typewise | Y Combinator Vivaldi 8.0: our biggest design overhaul, ever Samuel Alito Has Exposed Himself to Felony Bribery Charges Under New Jersey Law. I’m Filing for His Disbarment and Submitting a Criminal Referral. OpenAI to confidentially file for IPO as soon as Friday: Source Haskell Foundation 2026 Update What is Demand Coop and why tech workers should join one The Letter S, by Donald Knuth [pdf] GitHub - kageroumado/phosphene: A video wallpaper engine for macOS Tahoe DOS Zone | DOS games in browser A Bipartisan Amendment Would End Police License Plate Tracking Nationwide Starship's Twelfth Flight Test On Google declaring war on the Web GitHub - kouhxp/yapsnap: Snap any video URL or audio file into plaintext. No GPU. No cloud. One command. PopuLoRA: Co-Evolving LLM Populations for Reasoning Self-⁠Play Declining America Anthropic is expanding to Colossus2. Will use GB200 Anthropic is expanding to Colossus2. Will use GB200 SpaceX S-1 In Yesterday's IO Keynote Google Declared War on the Remnants of the Web Colorado Amended SB051 (Age Verification Bill) to Exclude Open Source Projects Not alive, but not dead: disembodied human brains used for drug testing Beyond Plastics Tracked Starbucks’ ‘Widely Recyclable’ Plastic Cups. None Ended Up at a Recycling Facility. — Beyond Plastics - Working To End Single-Use Plastic Pollution Flipper One Tech Specs Cooling copper plates could slash data center energy use by 90%
Apple Silicon costs more than OpenRouter
2026-05-17 · via Hacker News: Front Page

Offline Agentic Coding part 3: Apple Silicon costs more than OpenRouter.

Published 2026-05-17

Apple silicon costs more than open router. Spreadsheet showing tokens per second and costs to show overall cost per million tokens.

Apple silicon costs more than OpenRouter.

At ~50-100 watts under load, and ~$0.20 per kWh, my M5 MacbookPro will cost a few cents per hour. Accelerated depreciation (if any) from shortening the lifespan of the device will be more expensive than the electricity. At a few tens of tokens per second this works out to ammortized costs of ~$1.50 per million tokens. Openrouter for comparable models is 1/3rd the price and ~2x the speed.

Electricity

In Northern Virginia my last electricity bill worked out to $0.18 per kilowatt hour. Let's round up to $0.20 per kWh.

EIA has average residential costs for 2025 at $0.1730 per kWh in the US.
https://www.eia.gov/electricity/monthly/epm_table_grapher.php?t=table_5_03

At ~50-100 watts and $0.18/kWh that's $0.009 or $0.018 per hour. $0.02 per hour. $0.48 cents per day for the electricity to be running inference at 100%.

Hardware

A 14 inch MBP with M5 Max and 64 gigs of ram is currently listed as $4299 on the apple website. 128 gigs will cost you more but 64 gigs should run a model like Gemma 4 31b, which is almost anthropic sonnet levels of performance.

For cost allocation, let's consider that this hardware will last 3, 5, or 10 years. The cost per year is $1433, $860, or $430 respectively.

The hourly cost over 3, 5, and 10 years is thus:

  • $0.16358
  • $0.09815
  • $0.04908

Depending on useful lifespan, I think 5 years is a reasonable estimate for normal use. 7 or 10 is very plausible. At maxed out inference 3 years may be a reasonable estimate as well.

Tokenomics

The big question is how many tokens per hour can you get out of a local model. My M5 Max testing seems to be in the 10-40 tokens per second range for a serious model like Gemma4:31b. At 10 tokens per second that's 36000 tokens per hour.

36000 tokens per hour across our 3-10 year lifespan at $0.18 per kwh gives a price per million tokens of $1.61 to $4.79 on the high end.

At 40 tokens per second that's 144000 tokens per hour which gets you to $0.40 to $1.20 per million tokens.

For apple silicon, the hardware cost dominates.

OpenRouter has Gemma4 31b at ~38-50 cents per million tokens. This means that on the optimistic side (50 watts, 40 tokens per second, and 10 years) the pro max is as cheap as openrouter. On the pessimistic side (100 watts and 3 years at 10 tokens per second) the pro max is 10x the cost. I think ~3x the cost per million tokens is likely the right number for local inference on the pro max from an accounting perspective.

Conclusion

Speed of inference is the biggest factor here though for most cases. Local inference is slower than cloud inference. Some of the gemma 4 providers on openrouter get up to 60-70 tokens per second, which is 3-7 times faster than what I'm seeing with the pro max (~10-20 tokens per second). For a human employee with a work laptop, their salary costs are going to be ~1000x the cost of the tokens they can generate locally. Throwing money at anthropic makes more sense in this context.

It's still wild that a consumer device can run models that are close to anthropic sonnet levels of performance.