慣性聚合 高效追讀感興趣之博客、新聞、科技資訊
閱原文 以慣性聚合開啟

推薦訂閱源

博客园 - 司徒正美
V
V2EX
T
Tailwind CSS Blog
有赞技术团队
有赞技术团队
aimingoo的专栏
aimingoo的专栏
Apple Machine Learning Research
Apple Machine Learning Research
IT之家
IT之家
Blog — PlanetScale
Blog — PlanetScale
A
About on SuperTechFans
月光博客
月光博客
T
The Blog of Author Tim Ferriss
宝玉的分享
宝玉的分享
Martin Fowler
Martin Fowler
博客园 - 聂微东
The GitHub Blog
The GitHub Blog
V
Visual Studio Blog
WordPress大学
WordPress大学
酷 壳 – CoolShell
酷 壳 – CoolShell
Engineering at Meta
Engineering at Meta
GbyAI
GbyAI

DEV Community

Authentication Security Deep Dive: From Brute Force to Salted Hashing (With Java Examples) Why AI Systems Don’t Fail — They Drift Spilling beans for how i learn for exam😁"Reinforcement Learning Cheat Sheet" I Replaced Chrome with Safari for AI Browser Automation. Here's What Broke (and What Finally Worked) How Python Borrows Other People's Work The $40 Architecture: Processing 1 Billion API Requests with 99.99% Uptime Vibe Coding: A Workflow Guide (From Zero to SaaS) Most webhook security guides protect the wrong side. The scary part is delivery. Headless CMS for TanStack Start: Build a Blog with Cosmic EU Age Verification App "Hacked in 2 Minutes" — What Actually Happened Comfy Cloud’s delete function does not actually remove files Running AI Models on GPU Cloud Servers: A Beginner Guide Event-driven media intelligence with AWS Step Functions and Bedrock I scored 500 AI prompts across 8 quality dimensions — here's what broke How to Call Google Gemini API from Next.js (Free Tier, No Backend Needed) The Portal Protocol: Reclaiming Human Connection in the Age of AI How to Fix Your Team's Scattered Knowledge Problem With a Self-Hosted Forum Intro to tc Cloud Functors: A Graph-First Mental Model for the Modern Cloud Designing Multi-Tenant Backends With Both Ownership and Team Access I Built a Neumorphic CSS Library with 77+ Components — Here's What I Learned PostgreSQL Performance Optimization: Why Connection Pooling Is Critical at Scale Cómo construí un SaaS multi-rubro para gestionar expensas en Argentina con FastAPI + Vue 3 🚀 I Built an Ethical Hacking Scanner Tool – Open Source Project I Replaced /usage and /context in Claude Code With a Single Statusline A Pythonic Way to Handle Emails (IMAP/SMTP) with Auto-Discovery and AI-Ready Design I Collected 8.9 Million Polymarket Price Points — Here's What I Found About How Markets Really Move EcoTrack AI — Carbon Footprint Tracker & Dashboard Everyone's Using AI. No One Agrees How. 5 self-hosted ebook managers worth trying in 2026 Building Your First AI Agent with LangChain: From Chatbot to Autonomous Assistant Common SOC 2 Failures (Real World) Stop Vibe-Checking Your AI App: A Practical Guide to Evals How to Use SonarQube and SonarScanner Locally to Level Up Your Code Quality Your Next To-Do App Is Dead — I Replaced Mine with an OpenClaw AI Sign a Nostr event in 60 lines of Python using coincurve — no nostr-sdk, no nbxplorer, no rust toolchain ITGC Audit Explained Like You’re in Big 4 Patch Tuesday abril 2026: Microsoft parcha 163 vulnerabilidades y un zero-day en SharePoint Stop scraping everything: a better way to track competitor price changes Listing on MCPize + the Official MCP Registry while routing payments OUTSIDE the marketplace — how I kept 100% of my x402 revenue Building an AI-Powered Risk Intelligence System Using Serverless Architecture Why We Ripped Function Overloading Out of Our AI Toolchain Testing AI-Generated Code: How to Actually Know If It Works SaaS Churn Is Killing Your Business. Here Is What to Do About It (Without a Support Team) The Speed of AI Is No Longer Linear - And Self-Improving Models Are Why How to Implement RBAC for MCP Tools: A Practical Guide for Engineering Teams From Standard Quote to Persuasive Proposal: AI Automation for Arborists I built a CLI that scaffolds complete multi-tenant SaaS apps Axios CVE-2025–62718: The Silent SSRF Bug That Could Be Hiding in Your Node.js App Right Now The dashboard that ended our friendship Data Pipelines Explained Simply (and How to Build Them with Python)
启程码Go暨吾之OpenAgent:实能省费之模型路由配置
Devansh · 2026-05-24 · via DEV Community

诸般 OpenCode Go 之指南,多始于模型。吾欲自其谬误之所在始:限额非以请求数计,乃以金数计也。

此言似为细末之别,实则非也。

所众人所忽者

OpenCode Go 初月费五金,继之则每月十金。其用度之限,每五时辰十二金,每周三十金,每月六十金。

于五时之内,以十二金购DeepSeek V4之速闪,可得约三万一千六百五十请。若以同金购GLM-5.1,则得约八百八十。预算无殊,而量之差,乃三十有六倍也。

此乃路由之要义也。若择一模型而通用于万事,则或耗贵求于不须之务,或屈廉价之能于未尽其用。至当之策,乃依每务之实需,分派模型焉。

MiniMax M2.5,月限十万请求数,不拘成本。仅启约十亿参数,输入令牌计价较Claude Opus 4.6低十六倍有奇。高量低繁者,此乃明选,然世人多未知其存焉。

单用高端模态所失者

言若尽纳于DeepSeek V4 Pro:五时之内,凡千二百求。此于轻用,似无不可。然吾OpenAgent,能并行多使。Prometheus析汝之务,Metis合其境,Atlas司其序,Sisyphus行其事,而Librarian阅其文。一务之繁,可衍三十至五十求,而汝未尝有所为。汝五时之度,顷刻间消于数时之勤。

非质之差为患也。V4 Pro 八十有六,与Claude Opus 4.7 八十七有半,相去七分之差,于寻常之务,此隙几不可见。患非在质之精,乃在多能之程,非每步皆需此精也。

等级细目,具数列之

今列诸模于码务之要,其分于标,并API之价,以明选道之算:

模式 SWE-Bench验证 输入价格(每M tokens) 5小时请求量($12) 上下文
Claude Opus 4.7 87.6% $5.00 ~480 20万tokens
DeepSeek V4 Pro 80.6% $0.435(促销,截止5月31日) 至五千, 百万元符号,
Kimi K2.6, 八十有二, $0.95, 至二千五百, 二十五万六千符号,
Claude Sonnet 4.6, 七十九有六, $3.00, 至八百, 二十万符号,
MiMo-V2.5-Pro, 七十八有九, 至$0.40。
Qwen3.6 Plus 七十八有八分之百 $0.325 一百万令牌
DeepSeek V4 Flash 七十九有十分之百 $0.14 一万七千
一百万令牌 GLM-5.1 SWE-Bench 五十八有四分之百__JHSNS_SEG_6117360a_55__至~$1.50 至一千六百 二十万令牌
Qwen3.5 Plus $0.08 至三万
MiniMax M2.5 $0.03 高达十万每月

(每五小时窗口内请求量约计每请求二千五百令牌。)

Cost vs performance

谨记: Kimi K2.6 原版于二零二六年五月廿五日停更。此型号犹存,然系列不再更迭。DeepSeek V4 Pro 之促销价(每兆分之四百三十五元)止于五月卅一日——其后价增,遂改请求每窗之算。

奥普斯四点七之成,实为今时编撰之最强模型,较之V4专业者高七分。然每符计价五元,较之德普寻V4闪存者,每符昂三十五倍。于十二元五时之限,可得奥普斯四点七请求数约四百八十,而德普寻闪存者可得一万七千。

深寻V4闪存之效能,较V4专业版仅逊一筹,然每单位代币之成本则低约三倍。凡寻常编程之事,此差距于实践中不显。V4闪存总参数二百八十四亿,激活参数十三亿;V4专业版总参数一千六百亿,激活参数四十九亿。

Kimi K2.6乃百亿参数之MoE模型,其活跃参数三十亿,八十有二成SWE-Bench验证。此量超Qwen3.6 Plus,近V4 Pro,故为真确多步推演之良选,当V4 Flash困顿时可用之。

GLM-5.1总参数七十四亿,活跃四十亿。其二百千字符之境,宜于深谋远虑之务,于中庸之价,善司Oracle与Prometheus之职。

吾之OpenAgent构造如何

吾之OpenAgent v4.2.3(截至2026年5月,获48K+GitHub星标)采用三层架构:

规划层处理战略分解与知识融汇。二位智能体:普罗米修斯(分解应行之事)与美提斯(融汇情境与既有知识)。

编排层 者乃泰坦也。持待办之录,序事之次,察成之迹。自不亲为。惟司其序,何事何序,悉在其掌。

执行之层 乃事之所为。西西弗斯为司事之主,有卅二仟之思虑之资。九人或以上之专司者,各掌其事之别。

之v4.0.0,增设隊列之制,啟用七鈎之額外者,合計六十一,較標準之制五十四為多。若君之行並行之功,則隊列之制,其值在啟用之。此制,初為禁用。

路由之配置

此乃社會所推薦之代理至模型之指派。此非由理論推演,乃經多番試錯而得:

代理 主要模型 备用
西西弗斯 木梨K2.6 深求V4专业版,次Qwen3.6加版
赫淮斯托斯 深求V4专业版 深求V4闪存版,次木梨K2.6
奥瑞克洛斯 GLM-5.1 木梨K2.6,次深求V4专业版
图书管理员 深求V4闪存版 Qwen3.5 Plus
探求 DeepSeek V4 Flash
Prometheus GLM-5.1 Qwen3.6 Plus,复DeepSeek V4 Pro
Metis Qwen3.6 Plus DeepSeek V4 Pro
Atlas DeepSeek V4 Pro DeepSeek V4 Flash
代码审查者 Kimi K2.6 DeepSeek V4 Pro
多模态观察者 MiMo-V2.5-Pro Qwen3.6 Plus

Agent routing m

西西弗斯得Kimi K2.6,以其能行思至32K tokens。汝欲最强之推理模型,纵量微亦可。Kimi之256K上下文窗,可容长执行之迹。

閣老與探險者得V4之閃。此二使閱文檔,採取境脈,行查索之事。無需邊境之思辨。浪費V4 Pro於閣老,乃吾所見最常之預算錯誤也。

歷代聖賢,皆得龍馬精神。規劃深思,乃龍馬精神之所在。非最廉,亦非最貴,然於此類開放式分解之任務,其表現甚佳。

赫淮斯托斯(主司编码者)以V4 Pro为主,V4 Flash为备。其间差距甚微,故于简易编码之事,退而用Flash,无可见之损。

于Multimodal-Looker之上用MiMo-V2.5-Pro,乃有意为之。其于SWE-Bench验证中得78.9分,专为代理工作流而设。

路由决策之则

凡百请超百者,当首由V4 Flash通之。倘V4 Flash滞涩,则告于Kimi K2.6或V4 Pro。

此法可行,盖V4 Flash以79.0%之SWE-Bench验证率,已能正解大半世务编程之务。与V4 Pro仅差一筹,此实然,然鲜现于寻常之务,惟遇艰险之题,方显其隙。届时,退回之链可应之。

勿遽升。令模失败,乃升之。遽升者,若焚窗于一时。

Budget comparison

十元月费所购之实

月费六十为限,其算如下:

  • 每日五时,五日计之,得二十五时。
  • 每五时之窗:预算十二元
  • 道达无碍,寻常OpenAgent会话,于中工之能,或需四百至六百之请,其重者乃V4 Flash与Qwen3.5 Plus也

然则:月行八至十二次大码之会,未觉其极。独用者,十元一月已足。OpenCode于二六年五月获十五万GitHub之星,亦因算术合宜故也。

诚然较之,若以同等质效相较,Claude API 之价,当计月费百五十至三百金。此即十至二十倍之价减之誉所本,而吾之验之,信然。

诚直之权衡

此栈与Claude Opus 4.7于现实之虫补间,相去约七分之差。此实情也。或有券需屡次迭代,而Claude一次即得之。计其费。

七点之差乃诸务之平均。若事目明晰、验收标准昭然,此差自收窄。此路由配置特为升迁至Kimi K2.6或V4 Pro而设,盖因任务中此差最显之处也。

此栈之困,在于要求之晦,文件之多而依存之隐,及于事之需解系统之隐行。于此,上品之器得偿其值。其道之配置,以置Kimi K2.6于最难之事,然Kimi之境窗仅256K,而Qwen3.6 Plus为1M,故极长之境务,或需别配。

真实之配置

二文件制诸事:opencode.json 在尔项目之根,及 .omc/config.json 为 Oh My OpenAgent 之路由。

opencode.json

{
  "$schema": "https://opencode.ai/config.schema.json",
  "theme": "opencode",
  "autoshare": false,
  "model": "deepseek-v4-flash",
  "providers": {
    "opencode": {
      "models": [
        "deepseek-v4-pro",
        "deepseek-v4-flash",
        "kimi-k2.6",
        "glm-5.1",
        "qwen3.6-plus",
        "qwen3.5-plus",
        "mimo-v2.5-pro",
        "minimax-m2.5"
      ]
    }
  }
}

令全屏模式 退出全屏模式

"model" 之域定尔之默认。V4 Flash 为宜,因其应众务于最低之费。

.omc/config.json

{
  "version": "4.2.3",
  "teamMode": false,
  "agents": {
    "sisyphus": {
      "model": "kimi-k2.6",
      "fallback": ["deepseek-v4-pro", "qwen3.6-plus"],
      "thinkingBudget": 32000
    },
    "hephaestus": {
      "model": "deepseek-v4-pro",
      "fallback": ["deepseek-v4-flash", "kimi-k2.6"]
    },
    "oracle": {
      "model": "glm-5.1",
      "fallback": ["kimi-k2.6", "deepseek-v4-pro"]
    },
    "prometheus": {
      "model": "glm-5.1",
      "fallback": ["qwen3.6-plus", "deepseek-v4-pro"]
    },
    "metis": {
      "model": "qwen3.6-plus",
      "fallback": ["deepseek-v4-pro"]
    },
    "atlas": {
      "model": "deepseek-v4-pro",
      "fallback": ["deepseek-v4-flash"]
    },
    "librarian": {
      "model": "deepseek-v4-flash",
      "fallback": ["qwen3.5-plus"]
    },
    "explore": {
      "model": "deepseek-v4-flash",
      "fallback": []
    },
    "code-reviewer": {
      "model": "kimi-k2.6",
      "fallback": ["deepseek-v4-pro"]
    },
    "multimodal-looker": {
      "model": "mimo-v2.5-pro",
      "fallback": ["qwen3.6-plus"]
    }
  },
  "routing": {
    "escalationPolicy": "on-failure",
    "budgetAlert": 10.00,
    "windowBudget": 12.00
  }
}

入全景模式 出全景模式

escalationPolicy: "on-failure"严守其本:模型仅主用失效时方升,未尝先动。budgetAlert于十元时鸣警,使知窗内尚余二元,未及顶也。

速启

# Install OpenCode Go
npm install -g opencode

# Install Oh My OpenAgent
npx omc install oh-my-openagent

# Create opencode.json and .omc/config.json from the templates above, then:
omc init --preset oh-my-openagent

入全景模式 出全景模式

# Check your current window spend
opencode usage --window current

入全景模式 出全景模式

知所处之位于十二窗之内,则升迁至高级模型之态有变矣.


欲求原配置之法之详,吾初启之指南乃贾廷·马利克之文:OpenCode Go + Oh My OpenAgent: The Complete Guide to SOTA Model Routing Without Hitting Limits。其详述初版之v4.0-v4.1配置,与斯文并览,实为有益。