

























Author here.
This article came from looking at internal runs of an AI engineering workflow and noticing that evidence acquisition can happen surprisingly late in the process.
The two token observations in the article are not presented as a benchmark. They are just examples that led me to think the problem is not only “which model is cheaper”, but also “when does the system acquire evidence”.
I’m especially interested in criticism of the framing: whether evidence-first review is the right abstraction, or whether this should be treated purely as a routing / context-management problem.
此内容由惯性聚合(RSS阅读器)自动聚合整理,仅供阅读参考。 原文来自 — 版权归原作者所有。