惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
Application and Cybersecurity Blog
Application and Cybersecurity Blog
S
Securelist
K
Kaspersky official blog
Scott Helme
Scott Helme
C
CXSECURITY Database RSS Feed - CXSecurity.com
GbyAI
GbyAI
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
C
Cisco Blogs
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
博客园 - Franky
Security Latest
Security Latest
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
Y
Y Combinator Blog
T
Threat Research - Cisco Blogs
L
LINUX DO - 热门话题
C
Cyber Attacks, Cyber Crime and Cyber Security
Project Zero
Project Zero
Cisco Talos Blog
Cisco Talos Blog
月光博客
月光博客
I
Intezer
cs.CL updates on arXiv.org
cs.CL updates on arXiv.org
人人都是产品经理
人人都是产品经理
L
Lohrmann on Cybersecurity
Recorded Future
Recorded Future
Latest news
Latest news
V2EX - 技术
V2EX - 技术
T
The Exploit Database - CXSecurity.com
H
Heimdal Security Blog
F
Fortinet All Blogs
Cloudbric
Cloudbric
IT之家
IT之家
博客园 - 叶小钗
Microsoft Security Blog
Microsoft Security Blog
P
Proofpoint News Feed
博客园 - 司徒正美
Apple Machine Learning Research
Apple Machine Learning Research
PCI Perspectives
PCI Perspectives
AWS News Blog
AWS News Blog
H
Help Net Security
S
Security @ Cisco Blogs
酷 壳 – CoolShell
酷 壳 – CoolShell
Recent Announcements
Recent Announcements
Hacker News - Newest:
Hacker News - Newest: "LLM"
cs.CV updates on arXiv.org
cs.CV updates on arXiv.org
F
Full Disclosure
S
Schneier on Security
S
Security Affairs
T
Tenable Blog

VentureBeat

Anthropic says it hit a $30 billion revenue run rate after 'crazy' 80x growth OpenAI voice models get GPT-5-class reasoning AI agent identity: how to govern agentic AI in 6 stages Anthropic wants to own your agent's memory, evals, and orchestration — and that should make enterprises nervous Enterprise GPU utilization: why 95% of AI infrastructure spend is wasted Governance, not gatekeeping: How SAP brings enterprise‑grade safety to AI connectivity Anthropic introduces "dreaming," a system that lets AI agents learn from their own mistakes RL orchestration: how a 7B model routes tasks across GPT-5, Claude, and Gemini Meet ZAYA1-8B, a super efficient open reasoning model trained on AMD Instinct MI300 GPUs Anthropic Skill scanners passed every check. The malicious code rode in on a test file. Why AI breaks without context — and how to fix it Market research is too slow for the AI era, so Brox built 60,000 identical 'digital twins' of real people you can survey instantly, repeatedly The app store for robots has arrived: Hugging Face launches open-source Reachy Mini App Store with 200+ apps Scaling AI into production is forcing a rethink of enterprise infrastructure Miami startup Subquadratic claims 1,000x AI efficiency gain with SubQ model; researchers demand independent proof. GPT-5.5 Instant shows you what it remembered — just not all of it One command turns any open-source repo into an AI agent backdoor. OpenClaw proved no supply-chain scanner has a detection category for it AI agents are missing all the discussions your team is having. SageOX has an answer: agentic context infrastructure OpenAI turns its sold-out GPT-5.5 party into a monthlong Codex giveaway for 8,000 developers Inside AMEX’s agentic commerce stack: How intent contracts and single-use tokens enforce AI transactions Microsoft takes Agent 365 out of preview as shadow AI becomes an enterprise threat The RAG era is ending for agentic AI — a new compilation-stage knowledge layer is what comes next Salesforce Agentforce Operations fixes workflows breaking enterprise AI MCP command execution flaw: what security teams need to know The scaffolding era is over. LlamaIndex says context is the new moat xAI launches Grok 4.3 at an aggressively low price and a new, fast, powerful voice cloning suite Hidden IT problems are quietly creating risk, shadow IT, and lost productivity Alibaba's HDPO cuts AI agent tool overuse from 98% to 2% One tool call to rule them all? New open source Python tool Runpod Flash eliminates containers for faster AI dev Why OpenAI's 'goblin' problem matters — and how you can release the goblins on your own AI coding agents breached: attackers targeted credentials, not models | VentureBeat Writer launches AI agents that can act without prompts, taking on Amazon, Microsoft and Salesforce Netomi raises $110 million as Accenture and Adobe bet on AI for customer service Cheaper tokens, bigger bills: The new math of AI infrastructure Amazon’s OpenAI gambit signals a new phase in the cloud wars — one where exclusivity no longer applies Enterprise RAG rebuild: hybrid retrieval adoption tripled in Q1 2026 IBM launches Bob with multi-model routing and human checkpoints to turn AI coding into a secure production system AWS Quick's knowledge graph creates an orchestration blind spot Why enterprise GPU utilization is stuck at 5% — and why the fix makes it worse Definity embeds agents inside Spark pipelines to catch failures before they reach agentic AI systems How to build custom reasoning agents with a fraction of the compute American AI startup Poolside launches free, high-performing open model Laguna XS.2 for local agentic coding Mistral AI launches Workflows, a Temporal-powered orchestration engine already running millions of daily executions Microsoft and OpenAI gut their exclusive deal, freeing OpenAI to sell on AWS and Google Cloud Open source Xiaomi MiMo-V2.5 and V2.5-Pro are among the most efficient (and affordable) at agentic 'claw' tasks AI framework autonomously outperforms human-designed R&D baselines Why supply chains are the proving ground for automation‑led iPaaS RAG precision tuning can quietly cut retrieval accuracy by 40%, putting agentic pipelines at risk Enterprises are obsessing over model accuracy while ignoring the infrastructure layer where AI systems actually break. Monitoring LLM behavior: Drift, retries, and refusal patterns CVSS vulnerability triage: 5 failures, 5 fixes DeepSeek-V4 arrives with near state-of-the-art intelligence at fraction of the cost of Opus 4.7, GPT-5.5 85% of enterprises are running AI agents. Only 5% trust them enough to ship. AI synthetic audiences are already here and poised to upend the consulting industry Mystery solved: Anthropic reveals changes to Claude's harnesses and operating instructions likely caused degradation OpenAI's GPT-5.5 is here, and it's no potato: narrowly beats Anthropic's Claude Mythos Preview on Terminal-Bench 2.0 New startup BAND debuts agentic mesh with deterministic routing to govern multiple enterprise AI agents across model providers, channels OpenAI unveils Workspace Agents, a successor to custom GPTs for enterprises that can plug directly into Slack, Salesforce and more Google and AWS split the AI agent stack between control and execution Are you paying an AI ‘swarm tax’? Why single agents often beat complex systems OpenAI launches Privacy Filter, an open source, on-device data sanitization model that removes personal information from enterprise datasets Google doesn't pay the Nvidia tax. Its new TPUs explain why. Salesforce’s Agentforce Vibes 2.0 targets a hidden failure: context overload in AI agents Google’s Gemini can now run on a single air-gapped server — and vanish when you pull the plug The modern data stack was built for humans asking questions. Google just rebuilt its for agents taking action. Google’s new Deep Research and Deep Research Max agents can search the web and your private data Vercel breach exposes the OAuth gap most security teams cannot detect, scope or contain The AI governance mirage: Why 72% of enterprises don’t have the control and security they think they do OpenAI's ChatGPT Images 2.0 is here and it does multilingual text, full infographics, slides, maps, even manga — seemingly flawlessly Kimi K2.6 runs agents for days — and exposes the limits of enterprise orchestration What AI model should you use for revenue intelligence? Von says all the big ones, and it will automate mixing and matching for you Three AI coding agents leaked secrets through a single prompt injection. One vendor's system card predicted it Train-to-Test scaling explained: How to optimize your end-to-end AI compute budget for inference AI agent security maturity audit: enterprises funded stage one, stage-three threats arrived anyway Anthropic just launched Claude Design, an AI tool that turns prompts into prototypes and challenges Figma Should my enterprise AI agent do that? NanoClaw and Vercel launch easier agentic policy setting, approval dialogs for messaging apps Salesforce launches Headless 360 to turn its entire platform into infrastructure for AI agents Are we getting what we paid for? How to turn AI momentum into measurable value OpenAI debuts GPT-Rosalind, a new limited access model for life sciences, and broader Codex plugin on Github OpenAI drastically updates Codex desktop app to use all other apps on your computer, generate images, preview webpages Anthropic releases Claude Opus 4.7, narrowly retaking lead for most powerful generally available LLM AI lowered the cost of building software. Enterprise governance hasn’t caught up Microsoft patched a Copilot Studio prompt injection. The data exfiltrated anyway Frontier models are failing one in three production attempts — and getting harder to audit Meta researchers introduce 'hyperagents' to unlock self-improving AI for non-coding tasks We tested Anthropic’s redesigned Claude Code desktop app and 'Routines' -- here's what enterprises should know AI's next bottleneck isn't the models — it's whether agents can think together Adobe’s new Firefly AI Assistant wants to run Photoshop, Premiere, Illustrator and more from one prompt Traza raises $2.1 million led by Base10 to automate procurement workflows with AI Agentic coding at enterprise scale demands spec-driven development Designing the agentic AI enterprise for measurable performance Five signs data drift is already undermining your security models Your developers are already running AI locally: Why on-device inference is the CISO’s new blind spot AI agent credentials live in the same box as untrusted code. Two new architectures show where the blast radius actually stops. Intuit compressed months of tax code implementation into hours — and built a workflow any regulated-industry team can adapt OpenAI introduces ChatGPT Pro $100 tier with 5X usage limits for Codex compared to Plus Mythos autonomously exploited vulnerabilities that survived 27 years of human review. Security teams need a new detection playbook Claude, OpenClaw and the new reality: AI agents are here — and so is the chaos Goodbye, Llama? Meta launches new proprietary AI model Muse Spark — first since Superintelligence Labs' formation LLM-referred traffic converts at 30-40% — and most enterprises aren't optimizing for it
LLM reasoning, automated: tokens drop 69.5%
Ben Dickson · 2026-05-29 · via VentureBeat

Test-time scaling (TTS) has emerged as a proven method to improve the performance of large language models in real-world applications by giving them extra compute cycles at inference time. However, TTS strategies have historically been handcrafted, relying heavily on human intuition to dictate the rules of the model’s reasoning. 

To address this bottleneck, researchers from Meta, Google, and several universities have introduced AutoTTS, a framework that automatically discovers optimal TTS strategies. This automated approach allows enterprise organizations to dynamically optimize compute allocation without manually tuning heuristics. 

By implementing the optimal strategies discovered by AutoTTS, organizations can directly reduce the token usage and operational costs of deploying advanced reasoning models in production environments. In experimental trials, AutoTTS managed inference budgets efficiently, successfully reducing token consumption by up to 69.5% without sacrificing accuracy.

The manual bottleneck in test-time scaling

Test-time scaling enhances LLMs by granting them extra compute when generating answers. This extra compute allows the model to generate multiple reasoning paths or evaluate its intermediate steps before arriving at a final response. 

The primary challenge for designing TTS strategies is determining how to allocate this extra computation optimally. Historically, researchers have designed these strategies manually, relying on guesswork to build rigid heuristics. Engineers must hypothesize the rules and thresholds for when a model should branch out into new reasoning paths, probe deeper into an existing path, prune an unpromising branch, or stop reasoning altogether. 

Because this manual tuning process is constrained by human intuition, a vast amount of possible approaches remain unexplored. This often results in suboptimal trade-offs between model accuracy and computing costs.

Current TTS algorithms can be mapped to a width-depth control space — "width" being the number of reasoning branches explored, "depth" being how far each develops. Self-consistency (SC) samples a fixed number of trajectories and majority-votes the answer. Adaptive-consistency (ASC) saves compute by stopping early once a confidence threshold is hit. Parallel-probe takes a more granular approach, pruning unpromising branches while deepening the rest. All three are hand-crafted, and that's the constraint AutoTTS is designed to break.

While some more advanced methods employ richer structures like tree search or external verifiers, they all share one key characteristic: they are meticulously hand-crafted. This manual approach restricts the scope of strategy discovery, leaving a massive portion of the potential resource-allocation space untouched.

Automating strategy discovery with AutoTTS

AutoTTS reframes the way test-time scaling is optimized. Instead of treating strategy design as a human task, AutoTTS approaches it as an algorithmic search problem within a controlled environment. 

This framework redefines the roles of both the human engineer and the AI model. Rather than hand-crafting specific rules for when an LLM should branch, prune, or stop reasoning, the engineer's role shifts to constructing the discovery environment. The human defines the boundaries, including the control space of states and actions, optimization objectives balancing accuracy versus cost, and the specific feedback mechanisms. 

autotts.

AutoTTS framework (source: arXiv)

An explorer LLM, such as Claude Code, designs the strategy. This explorer acts as an autonomous agent that iteratively proposes TTS “controllers.” These controllers are code-defined policies or algorithms that dictate how an AI model allocates its computational budget during inference. The explorer tests and refines these controllers based on feedback until it discovers an optimal resource-allocation policy. 

To make this automated search computationally affordable, AutoTTS relies on an “offline replay environment.” If the explorer LLM had to invoke a base reasoning model to generate new tokens every time it tested a new strategy, the compute costs would be astronomical. Instead, it relies on thousands of reasoning trajectories pre-collected from the base LLM. These trajectories include "probe signals," which are intermediate answers that help the controller evaluate progress across different reasoning branches. 

During the discovery loop, the explorer agent proposes a controller and evaluates it against this offline data. The agent observes the execution traces of the proposed controller that show it allocated compute over time. By analyzing these traces, the agent can diagnose specific failure modes, such as noting if a controller pruned branches too aggressively in a specific scenario. This provides an advantage over just viewing a final result. The agent then iteratively rewrites its code to improve the accuracy-cost tradeoff. 

Inside the AI-designed controller

Because the explorer agent is not constrained by human intuition, it can discover highly coordinated, complex rules that a human engineer would likely never hand-code. One optimal controller discovered by AutoTTS, named the Confidence Momentum Controller, leverages several non-obvious mechanisms to manage compute:

  • Trend-based stopping: Hand-crafted strategies often instruct the model to stop reasoning once it hits a certain instantaneous confidence threshold. The AutoTTS agent discovered that instantaneous confidence can be misleading due to temporary spikes. Instead, the controller tracks an exponential moving average (EMA) of confidence and only stops if the overall confidence level is high and the trend is not actively declining.

  • Coupled width-depth control: Manually designed algorithms usually treat the "widening" of new reasoning paths and the "deepening" of current paths as separate decisions. AutoTTS discovered a closed feedback loop where the two actions are linked. If the confidence of the current branches stalls or regresses, the controller automatically triggers the spawning of new branches.

  • Alignment-aware depth allocation: Instead of giving all active reasoning branches an equal computation budget, the controller dynamically identifies which branches agree with the current leading answer. It then gives those branches priority "bursts" of extra computation. This concentrates the computational budget on the emerging consensus to quickly verify if it is correct.

Cost savings and accuracy gains in real-world benchmarks

To test whether an AI could autonomously discover a better test-time scaling strategy, researchers set up a rigorous evaluation framework. The core experiments were conducted on Qwen3 models ranging from 0.6B to 8B parameters. The researchers also tested the system's ability to generalize on a distilled 8B version of the DeepSeek-R1 model. 

The explorer AI agent was initially tasked with discovering an optimal strategy using the AIME24 mathematical reasoning benchmark. This discovered strategy was then tested on two held-out math benchmarks, AIME25 and HMMT25, as well as the graduate-level general reasoning benchmark GPQA-Diamond. 

The AutoTTS discovered controller was pitted against four manually designed test-time scaling algorithms in the industry. These baselines included Self-Consistency with 64 parallel reasoning paths (SC@64), Adaptive-Consistency (ASC), Parallel-Probe, and Early-Stopping Self-Consistency (ESC). ESC is a hybrid approach that generates trajectories in parallel and stops early when an answer seems stable.

scaling curves

AutoTTS (red line) outperforms other baselines on industry benchmarks (source: arXiv)

When set to a balanced, cost-conscious mode, the AutoTTS-discovered controller reduced total token consumption by approximately 69.5% compared to SC@64. At the same time, the controller maintained the same average accuracy across the four Qwen models. When the inference budget was turned up, AutoTTS pushed peak accuracy beyond all handcrafted baselines in five out of eight test cases.

VB Transform · July 14–15 · Menlo Park · Agentic orchestration

Intuit rebuilt its multi-agent system in 60 days. What did they change — and why?

At Transform, engineering leaders from Intuit, Target, and Instacart break down how they redesigned their orchestration architectures for reliability, scale, and real customers.

See the full agenda →

This efficiency translated to other tasks. On the GPQA-Diamond benchmark, the balanced AutoTTS variant slashed the inference token cost from 510K tokens down to just 151K tokens, while slightly improving overall accuracy. On the DeepSeek model, AutoTTS achieved the highest overall accuracy on the HMMT25 benchmark while cutting the token spend nearly in half.

For practitioners building enterprise AI applications, these experiments highlight two major operational benefits:

  • Raising peak performance: AutoTTS doesn't just save money on token consumption. It actively raises the peak attainable performance of the base model. The AI-designed controller is remarkably good at detecting noisy or unproductive reasoning branches on the fly and continuously redirecting its compute budget toward the branches generating the most useful reasoning signals.

  • Cost-effective custom development: Because the framework relies on an offline replay environment, the entire discovery process cost only $39.90 and took 160 minutes. For enterprise teams, that means optimized reasoning strategies tailored to proprietary models and internal tasks are now within reach — without a dedicated research budget.

Both the AutoTTS framework and the Confidence Momentum Controller are available on GitHub; the CMC can be used as a drop-in replacement for other TTS controllers.