惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

罗磊的独立博客
T
Threatpost
S
Security Archives - TechRepublic
S
Securelist
Latest news
Latest news
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
Hugging Face - Blog
Hugging Face - Blog
美团技术团队
月光博客
月光博客
宝玉的分享
宝玉的分享
The Cloudflare Blog
M
MIT News - Artificial intelligence
P
Proofpoint News Feed
C
Cybersecurity and Infrastructure Security Agency CISA
Know Your Adversary
Know Your Adversary
博客园 - 叶小钗
P
Proofpoint News Feed
GbyAI
GbyAI
SecWiki News
SecWiki News
L
LINUX DO - 热门话题
Hacker News: Ask HN
Hacker News: Ask HN
B
Blog
Vercel News
Vercel News
AI
AI
Simon Willison's Weblog
Simon Willison's Weblog
N
News | PayPal Newsroom
T
Tailwind CSS Blog
S
Secure Thoughts
The Last Watchdog
The Last Watchdog
TaoSecurity Blog
TaoSecurity Blog
V
V2EX
Security Latest
Security Latest
P
Privacy International News Feed
Google DeepMind News
Google DeepMind News
Attack and Defense Labs
Attack and Defense Labs
Jina AI
Jina AI
L
LINUX DO - 最新话题
T
Tenable Blog
IT之家
IT之家
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
博客园_首页
C
Cisco Blogs
CTFtime.org: upcoming CTF events
CTFtime.org: upcoming CTF events
NISL@THU
NISL@THU
G
Google Developers Blog
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
D
DataBreaches.Net
人人都是产品经理
人人都是产品经理
Schneier on Security
Schneier on Security
B
Blog RSS Feed

VentureBeat

Anthropic says it hit a $30 billion revenue run rate after 'crazy' 80x growth OpenAI voice models get GPT-5-class reasoning AI agent identity: how to govern agentic AI in 6 stages Anthropic wants to own your agent's memory, evals, and orchestration — and that should make enterprises nervous Enterprise GPU utilization: why 95% of AI infrastructure spend is wasted Governance, not gatekeeping: How SAP brings enterprise‑grade safety to AI connectivity Anthropic introduces "dreaming," a system that lets AI agents learn from their own mistakes RL orchestration: how a 7B model routes tasks across GPT-5, Claude, and Gemini Meet ZAYA1-8B, a super efficient open reasoning model trained on AMD Instinct MI300 GPUs Anthropic Skill scanners passed every check. The malicious code rode in on a test file. Why AI breaks without context — and how to fix it Market research is too slow for the AI era, so Brox built 60,000 identical 'digital twins' of real people you can survey instantly, repeatedly The app store for robots has arrived: Hugging Face launches open-source Reachy Mini App Store with 200+ apps Scaling AI into production is forcing a rethink of enterprise infrastructure Miami startup Subquadratic claims 1,000x AI efficiency gain with SubQ model; researchers demand independent proof. GPT-5.5 Instant shows you what it remembered — just not all of it One command turns any open-source repo into an AI agent backdoor. OpenClaw proved no supply-chain scanner has a detection category for it AI agents are missing all the discussions your team is having. SageOX has an answer: agentic context infrastructure OpenAI turns its sold-out GPT-5.5 party into a monthlong Codex giveaway for 8,000 developers Inside AMEX’s agentic commerce stack: How intent contracts and single-use tokens enforce AI transactions Microsoft takes Agent 365 out of preview as shadow AI becomes an enterprise threat The RAG era is ending for agentic AI — a new compilation-stage knowledge layer is what comes next Salesforce Agentforce Operations fixes workflows breaking enterprise AI MCP command execution flaw: what security teams need to know The scaffolding era is over. LlamaIndex says context is the new moat xAI launches Grok 4.3 at an aggressively low price and a new, fast, powerful voice cloning suite Hidden IT problems are quietly creating risk, shadow IT, and lost productivity Alibaba's HDPO cuts AI agent tool overuse from 98% to 2% One tool call to rule them all? New open source Python tool Runpod Flash eliminates containers for faster AI dev Why OpenAI's 'goblin' problem matters — and how you can release the goblins on your own AI coding agents breached: attackers targeted credentials, not models | VentureBeat Writer launches AI agents that can act without prompts, taking on Amazon, Microsoft and Salesforce Netomi raises $110 million as Accenture and Adobe bet on AI for customer service Cheaper tokens, bigger bills: The new math of AI infrastructure Amazon’s OpenAI gambit signals a new phase in the cloud wars — one where exclusivity no longer applies Enterprise RAG rebuild: hybrid retrieval adoption tripled in Q1 2026 IBM launches Bob with multi-model routing and human checkpoints to turn AI coding into a secure production system AWS Quick's knowledge graph creates an orchestration blind spot Why enterprise GPU utilization is stuck at 5% — and why the fix makes it worse Definity embeds agents inside Spark pipelines to catch failures before they reach agentic AI systems How to build custom reasoning agents with a fraction of the compute American AI startup Poolside launches free, high-performing open model Laguna XS.2 for local agentic coding Mistral AI launches Workflows, a Temporal-powered orchestration engine already running millions of daily executions Microsoft and OpenAI gut their exclusive deal, freeing OpenAI to sell on AWS and Google Cloud Open source Xiaomi MiMo-V2.5 and V2.5-Pro are among the most efficient (and affordable) at agentic 'claw' tasks AI framework autonomously outperforms human-designed R&D baselines Why supply chains are the proving ground for automation‑led iPaaS RAG precision tuning can quietly cut retrieval accuracy by 40%, putting agentic pipelines at risk Enterprises are obsessing over model accuracy while ignoring the infrastructure layer where AI systems actually break. Monitoring LLM behavior: Drift, retries, and refusal patterns CVSS vulnerability triage: 5 failures, 5 fixes DeepSeek-V4 arrives with near state-of-the-art intelligence at fraction of the cost of Opus 4.7, GPT-5.5 85% of enterprises are running AI agents. Only 5% trust them enough to ship. AI synthetic audiences are already here and poised to upend the consulting industry Mystery solved: Anthropic reveals changes to Claude's harnesses and operating instructions likely caused degradation OpenAI's GPT-5.5 is here, and it's no potato: narrowly beats Anthropic's Claude Mythos Preview on Terminal-Bench 2.0 New startup BAND debuts agentic mesh with deterministic routing to govern multiple enterprise AI agents across model providers, channels OpenAI unveils Workspace Agents, a successor to custom GPTs for enterprises that can plug directly into Slack, Salesforce and more Google and AWS split the AI agent stack between control and execution Are you paying an AI ‘swarm tax’? Why single agents often beat complex systems OpenAI launches Privacy Filter, an open source, on-device data sanitization model that removes personal information from enterprise datasets Google doesn't pay the Nvidia tax. Its new TPUs explain why. Salesforce’s Agentforce Vibes 2.0 targets a hidden failure: context overload in AI agents Google’s Gemini can now run on a single air-gapped server — and vanish when you pull the plug The modern data stack was built for humans asking questions. Google just rebuilt its for agents taking action. Google’s new Deep Research and Deep Research Max agents can search the web and your private data Vercel breach exposes the OAuth gap most security teams cannot detect, scope or contain The AI governance mirage: Why 72% of enterprises don’t have the control and security they think they do OpenAI's ChatGPT Images 2.0 is here and it does multilingual text, full infographics, slides, maps, even manga — seemingly flawlessly Kimi K2.6 runs agents for days — and exposes the limits of enterprise orchestration What AI model should you use for revenue intelligence? Von says all the big ones, and it will automate mixing and matching for you Three AI coding agents leaked secrets through a single prompt injection. One vendor's system card predicted it Train-to-Test scaling explained: How to optimize your end-to-end AI compute budget for inference AI agent security maturity audit: enterprises funded stage one, stage-three threats arrived anyway Anthropic just launched Claude Design, an AI tool that turns prompts into prototypes and challenges Figma Should my enterprise AI agent do that? NanoClaw and Vercel launch easier agentic policy setting, approval dialogs for messaging apps Salesforce launches Headless 360 to turn its entire platform into infrastructure for AI agents Are we getting what we paid for? How to turn AI momentum into measurable value OpenAI debuts GPT-Rosalind, a new limited access model for life sciences, and broader Codex plugin on Github OpenAI drastically updates Codex desktop app to use all other apps on your computer, generate images, preview webpages Anthropic releases Claude Opus 4.7, narrowly retaking lead for most powerful generally available LLM AI lowered the cost of building software. Enterprise governance hasn’t caught up Microsoft patched a Copilot Studio prompt injection. The data exfiltrated anyway Frontier models are failing one in three production attempts — and getting harder to audit Meta researchers introduce 'hyperagents' to unlock self-improving AI for non-coding tasks We tested Anthropic’s redesigned Claude Code desktop app and 'Routines' -- here's what enterprises should know AI's next bottleneck isn't the models — it's whether agents can think together Adobe’s new Firefly AI Assistant wants to run Photoshop, Premiere, Illustrator and more from one prompt Traza raises $2.1 million led by Base10 to automate procurement workflows with AI Agentic coding at enterprise scale demands spec-driven development Designing the agentic AI enterprise for measurable performance Five signs data drift is already undermining your security models Your developers are already running AI locally: Why on-device inference is the CISO’s new blind spot AI agent credentials live in the same box as untrusted code. Two new architectures show where the blast radius actually stops. Intuit compressed months of tax code implementation into hours — and built a workflow any regulated-industry team can adapt OpenAI introduces ChatGPT Pro $100 tier with 5X usage limits for Codex compared to Plus Mythos autonomously exploited vulnerabilities that survived 27 years of human review. Security teams need a new detection playbook Claude, OpenClaw and the new reality: AI agents are here — and so is the chaos Goodbye, Llama? Meta launches new proprietary AI model Muse Spark — first since Superintelligence Labs' formation LLM-referred traffic converts at 30-40% — and most enterprises aren't optimizing for it
Stanford's DeLM cuts multi-agent costs 50%
Taryn Plumb · 2026-06-17 · via VentureBeat

One of the assumptions behind today’s AI frameworks is that agents require a “boss” at the center; this orchestrator runs the show, routes requests, and makes sure the whole system doesn’t descend into chaos.

That assumption may be wrong, and the cost of carrying it could be measured in inference dollars and coordination latency. A new Stanford framework called a decentralized language model, or DeLM, is built on the premise that agents can coordinate directly, without routing every update through a central controller.

DeLM's shared knowledge base serves as a “common communication substrate” so that agents can build upon one another’s verified progress without having to route every interaction through a main agent to “merge, filter, and rebroadcast,” Yuzhen Mao and Azalia Mirhoseini, co-developers of the framework, explain in a research paper.

It’s a system that’s not only possible, but desirable in certain instances. “Agents can build on prior findings, avoid repeated failures, preserve constraints, and recover detailed evidence only when needed.”

The challenges of traditional multi-agent systems

In a typical centralized multi-agent system, a main agent breaks tasks into subtasks, assigns them out to multiple sub-agents in parallel, waits for responses, merges and summarizes intermediate progress, then launches a next wave of orders based on collected context.

While this is a natural way to scale LLM reasoning, the Stanford researchers argue that it scales poorly. Every useful finding, partial finding, and failure must be reported back to the main agent, which then determines what information to merge and rebroadcast to the agents below it.

“As the number of subtasks grows, this controller becomes a communication and integration bottleneck,” Mao and Mirhoseini write. Further, the main orchestrator may “dilute, omit, or distort” useful information, leading to lost progress.

This bottleneck also occurs in long-context reasoning scenarios. Once it receives reports back from subagents, a main agent will typically group related concepts, data points, and other materials together in an unsupervised learning loop. It may then pre-assign these "evidence clusters" to sub-agents before knowing what surfaced material is actually relevant or whether it’s combined correctly.

When a subagent receives this insufficient context, it will essentially get confused and return to the main agent, kicking off another retrieval or delegation round. “This back-and-forth makes coordination slower, more iterative, and increasingly constrained by a single overloaded main agent,” the researchers write.

VB Transform · July 14–15 · Menlo Park · Agentic orchestration

Intuit rebuilt its multi-agent system in 60 days. What did they change — and why?

At Transform, engineering leaders from Intuit, Target, and Instacart break down how they redesigned their orchestration architectures for reliability, scale, and real customers.

See the full agenda →

What DeLM addresses and how it works

DeLM, by contrast, is built around parallel agents, a shared context, and a task queue.

Shared context is essentially a curated store of “gists,” or information summaries that other agents might find useful. These include verified and evidence-based findings alongside partial findings and documented failures; they also point to detailed evidence that agents can pull from based on their specific task.

A task queue is then a set of subsequent pending subtasks that agents can claim independently.

“Agents write compact, verified updates into a shared context that later agents can read directly,” the researchers write. Useful findings, failures, and constraints accumulate as a “shared problem state,” rather than passing through a central controller.

The pipeline looks like this:

  • Initialization: Inputs are broken into different work units and added to a queue;

  • Parallel execution: Agents work independently and in tandem, pulling tasks and reading shared context as they progress.

  • Compression and verification: Results are compressed into reusable “gists” that are checked against supporting evidence. Only gists that are fully verified are shared with the group.

  • Additional work (if needed): When the queue is emptied, the last agent to return an answer inspects all the shared context to determine whether further work is required.

  • Final step: The last agent determines that no more steps are required and returns the final answer.

Agents “exchange progress through shared state, asynchronously claim ready tasks, and scale more adaptively as the number of subtasks grows,” the researchers explain.

How DeLM performs in the wild

With DeLM, agents can avoid redundant exploration; reuse and build on each other’s discoveries and failures; and focus on unresolved issues.

The framework can be particularly useful in software engineering test-time scaling, when models are given time to “think” to improve their reasoning and problem-solving capabilities. Different agents can explore their own hypotheses or pursue reasoning paths in parallel, while still sharing intermediate progress. One example is concurrent de-bugging.

DeLM is also suitable for long-context reasoning and multi-document question-answering; agents can simultaneously examine their own evidence clusters (collections of papers, code, or other materials) at the same time, while maintaining a “global compact view” of accumulated evidence.

The researchers contend that it makes agentic tasks more accurate and significantly cheaper. This is backed by its performance on real-world benchmarks: On SWE-bench Verified — which evaluates how well AI models and agents solve real-world software engineering problems — it performed 10.5% better than the strongest baseline and reduced cost per task by roughly 50%.

But it can go beyond coding: On LongBench‑v2 Multi‑Doc QA — which assesses LLMs’ ability to handle long-context, real-world problems — DeLM had the highest accuracy across four model families, including GPT‑5.4, Claude Sonnet, Gemini Flash, and DeepSeek‑V4‑Pro.

DeLM outperforms other models on SWE-Bench for a number of reasons, as Mao detailed on X.

First, agents share failures. In ordinary parallel runs, when one agent follows the wrong path, that failure stays private, and subsequent agents may waste time (and money) pursuing the same dead end. But with DeLM, failed hypotheses are written into shared context.

“Later agents can read them as constraints, avoid repeated exploration, and redirect their search toward more promising fixes,” Mao said.

Additionally, constraints, once verified, are immediately added to agents’ shared context. This means they become a binding shared state. “Later agents inherit them, build around them, and avoid repeating globally invalid simplifications,” Mao said.

Crucially, DeLM keeps shared progress compact enough to reuse. It is unfoldable, meaning agents see short gists by default, but can choose to unfold them into more detailed summaries and raw evidence.

As the researchers note, providing all raw documents and traces gives agents the maximum amount of information, but that can overwhelm their context windows and ultimately increase costs.

“If agents shared full traces, each worker would need to read long command histories, file dumps, failed edits, and intermediate reasoning, turning coordination itself into another long-context bottleneck,” Mao said.

On the other hand, while sharing compact summaries is cheaper, important details and evidence can be lost, resulting in less reliable reasoning.

Unfolding, therefore, provides “coarse-to-fine” opt-in access. This can improve accuracy and cost.

Ultimately, with a framework like DeLM, agents can be more efficient because they are prevented from repeatedly reading the same documents or rerunning the same failed analysis; more effective because useful findings are propagated across parallel threads; and more robust because they only share verified claims.

For enterprise builders, DeLM challenges a core assumption: that every multi-agent workflow needs a central controller. The SWE-bench and LongBench-v2 results suggest the decentralized model isn't just theoretically cleaner — it's faster, more accurate, and roughly half the cost.