惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

Simon Willison's Weblog
Simon Willison's Weblog
Help Net Security
Help Net Security
P
Privacy International News Feed
T
Threat Research - Cisco Blogs
C
Cisco Blogs
C
CERT Recently Published Vulnerability Notes
NISL@THU
NISL@THU
L
LINUX DO - 热门话题
Security Latest
Security Latest
A
Arctic Wolf
G
GRAHAM CLULEY
月光博客
月光博客
S
Securelist
D
Docker
J
Java Code Geeks
T
Troy Hunt's Blog
T
Tenable Blog
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
SecWiki News
SecWiki News
S
Security @ Cisco Blogs
量子位
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
L
LINUX DO - 最新话题
Recent Commits to openclaw:main
Recent Commits to openclaw:main
aimingoo的专栏
aimingoo的专栏
博客园 - 【当耐特】
H
Heimdal Security Blog
The Hacker News
The Hacker News
博客园 - 三生石上(FineUI控件)
Application and Cybersecurity Blog
Application and Cybersecurity Blog
N
Netflix TechBlog - Medium
Vercel News
Vercel News
Forbes - Security
Forbes - Security
B
Blog RSS Feed
H
Hackread – Cybersecurity News, Data Breaches, AI and More
IT之家
IT之家
B
Blog
MongoDB | Blog
MongoDB | Blog
博客园 - 聂微东
Google DeepMind News
Google DeepMind News
S
Secure Thoughts
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
C
Check Point Blog
云风的 BLOG
云风的 BLOG
CTFtime.org: upcoming CTF events
CTFtime.org: upcoming CTF events
T
The Blog of Author Tim Ferriss
L
Lohrmann on Cybersecurity
F
Full Disclosure
D
Darknet – Hacking Tools, Hacker News & Cyber Security
P
Proofpoint News Feed

HN's home page

Rainbow Query Language | Hacker News Exec into Node via Kubectl An AI native hedge fund The Seven-Action Documentation Model | Hacker News Package Manager for Kubectl Plugins Tongan Castaways | Hacker News Tech overlords plan for conscious AI to conquer the cosmos. What could go wrong? Data Breach Disclosure Lag Is Getting Worse How LLMs Work | Hacker News I Dropped PRDs for Shape Up Go Experiments Explained | Hacker News FCA's Palantir deal could expose UK financial data to Trump's US, critics fear WebXR BCI for Neural-Adaptive Avatar Control in Mixed Reality The first murder conviction via DNA analysis Tom Interviews Theo de Raadt of the OpenBSD Project (2019) [video] Show HN: Replace shell commands with bun shell typescript scripts Quay.io Is Down | Hacker News AI driven analysis of brokerage account fees in the UK Bill Gates Spent Years Crafting His Image. Now It's Cracking Using LLMs to secure source code Wi-Fi 8 in the Lab [video] The household battery revolution that could change energy bills and the world Is Python Becoming Pinyin? | Hacker News Livia – Executive Assistant | Hacker News FindMyPipe – Query Apple Find My from Linux for AI Agents Show HN: Agent skill for creating product launch videos with Remotion RecruitMyself – AI job search copilot for resumes and applications AI coding agents and the erosion of system understanding The 'Resting' Generation and South Korea's Youth Recession AMD Computex 2026: 10 Years of AM4, AM5 Support Through 2029 Docker Networking Explained | Hacker News Textbooks in Tokenland | Hacker News Key Chemistry Question Answered, No Quantum Computer Required Gifts For Retrocomputing Fans – remix yesterday's tech with a modern spin Miscellany № 49: introducing the quasiquote – Shady Characters Amazon Thinks the Future of Data Centers Is a Technical Problem It Just Solved A brief history of the UUID (2017) Flying High Unpressurized (2016) | Hacker News Five Years of Trying to Add Recursion to Lychee How British comfort food won over the French Blorp Language | Hacker News Decache – you might have the internet's lost media in your PC's cache folders Criminal Activities and Migration | Hacker News A free, open-source library of DESIGN.md files for AI-generated UIs MiniMax M3 | Hacker News People are apparently farming citations on ResearchGate – Chuniversiteit Hacker News Basketeer – a typed TS SDK for your Tesco account, with nutrition data 'Penguin' decays from CERN's Large Hadron Collider experiment hint new physics Emergence World: A Laboratory for Evaluating Long-Horizon Agent Autonomy Homebrew lead Mike McQuaid: Sandboxes and Worktrees - My Secure Agentic AI Setup Lean, Not Backpressure | Hacker News AI Dangers Eclipse Nuclear Weapons at Singapore Defense Forum Open source analytics that answers backbase How turkey hacked the hair-transplant industry How GPT Image 2 Is Transforming Marketing Workflows in 2026 Improve Git monorepo performance with a file system monitor Strava for Claude Code MiniMax M3 on Qubrid AI There's Something Else We Should Be Worrying About Celebrity Profile of an A.I. Actress What Is Windows K2? | Hacker News AI is devoid of meaning and humanity. Its vapid voice suits the political moment Show HN: Interpreto – Live Translation for Travel Taxicab Geometry Sealed classes and interfaces in Java (2025) Show HNs | Hacker News My AI Skill Edited This Video That Explains My AI Skill – Arcturus Labs Amazon Pinpoint End of Support The Mystery of the Backward Index MP/M's Process Dispatcher SlimTide Reviews: A Modern Solution for Metabolism and Energy Learning Lustre: Type-safe front end development with gleam Thomas Mann: Goethe Heartened by Panama (As Suez for English, or Danube-Rhine) How to make Message Log of the Unreal Engine 100 times faster Sum-product, unit distances, and number fields Can Meta Buy Belief? | Hacker News Twenty Years of Bigtable | Hacker News Show HN: Combine WigglyPaint GIFs into Video Show HN: AgentThreatBench – Benchmark for AI Agent Memory Security Genius Spotted in the Wild Napkins: Where Ethernet, Compaq and Facebook’s cool data center got their starts (2011) Moderate caffein use alters sleep-related EEG Nvidia Announces RTX Spark | Hacker News Show HN: Ministry of Everything – CLI agent harness for a single operator CEOs blame AI for layoffs, MIT prof says it fits a pattern to find cover story Bugs I didn't expect while building a zsh cleanup script for macOS dev machines Nvidia jumps into PCs with new chip debuting in laptops from Microsoft, Dell, HP Nvidia unveils PC 'superchip' in challenge to Apple and Intel Show HN: Having fun making mini static site apps Synthea API: Create Synthetic Medical Records as a Service Berkshire Hathaway to buy Taylor Morrison for $6.8B in cash The most complex model we understand [video] SanDisk stock is +4,440.53% in the past year Driftwm: What if your window manager worked like a whiteboard? US Immigration enforcement looks into buying ad data AI Is Creating More Work for Australia's Workplace Tribunal Finding New Biblical Cross-References with Codex Glide: A tiling window manager for macOS Ultra-highly efficient enrichment of uranium from seawater via studtite nanodots (2024)
Show HN: Smart model routing directly in Claude, Codex and Cursor
adchurch · 2026-06-27 · via HN's home page

We built a model router that plugs into coding agents (e.g. Claude Code, Codex, Cursor, etc.) and intelligently sends requests to the best model to serve them. Here's a quick demo of running it locally: https://youtu.be/isKhAyivtfM

At Weave, we write ~all our code with AI, and it's been getting more expensive. This came to a head when Opus 4.7 was released and, thanks to its tokenizer changes, our costs shot up. We knew we didn't need Opus for everything but we didn't want to lose out on the intelligence for the cases where you really need it. So we decided to build a model router to handle this for us.

The Weave Router acts as an Anthropic/OpenAI endpoint specifically for coding agents. It looks at every inference request and intelligently (more on that in a sec) decides what model to send it to, handling all the translations required along the way. So it can use faster/cheaper models (e.g. DeepSeek v4, GLM 5.2, Kimi K2.6) when possible, and frontier models (Opus 4.8 & GPT 5.5 (& Fable whenever it's back)) when necessary.

How do we know what model to route to? We trained an RL model on tens of thousands (so far!) of agent traces. We reward the routing model when it selects an LLM that successfully completes the given task.

Here's an example: if you ask the router to plan a complex change, it will (probably) route that request to Opus 4.8. Subagents exploring the codebase to gather context will be routed to more suitable models (e.g. DeepSeek V4 Flash). Then when you have the plan ready to implement, it will be (most likely) be handed to a quicker model (e.g. GLM 5.2) to carry it out.

We've been using this internally for the last month or so. We've saved 40% on tokens vs. what we otherwise would have paid, with no noticeable differences in quality or velocity.

The router is source-available under Elastic License 2.0, so you can self-host it. Or if you prefer, you can also use our hosted version: weaverouter.com.

I'll be here to answer any questions you may have!