惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

人人都是产品经理
人人都是产品经理
Recorded Future
Recorded Future
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
Jina AI
Jina AI
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
The GitHub Blog
The GitHub Blog
Microsoft Azure Blog
Microsoft Azure Blog
博客园_首页
Google DeepMind News
Google DeepMind News
W
WeLiveSecurity
The Hacker News
The Hacker News
博客园 - 叶小钗
雷峰网
雷峰网
D
Docker
大猫的无限游戏
大猫的无限游戏
C
Cyber Attacks, Cyber Crime and Cyber Security
酷 壳 – CoolShell
酷 壳 – CoolShell
Latest news
Latest news
Y
Y Combinator Blog
有赞技术团队
有赞技术团队
S
Schneier on Security
V
Visual Studio Blog
Hugging Face - Blog
Hugging Face - Blog
Scott Helme
Scott Helme
Engineering at Meta
Engineering at Meta
宝玉的分享
宝玉的分享
P
Privacy International News Feed
L
LangChain Blog
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
MyScale Blog
MyScale Blog
Cyberwarzone
Cyberwarzone
J
Java Code Geeks
D
Darknet – Hacking Tools, Hacker News & Cyber Security
量子位
F
Fortinet All Blogs
阮一峰的网络日志
阮一峰的网络日志
S
Securelist
Know Your Adversary
Know Your Adversary
P
Palo Alto Networks Blog
Cisco Talos Blog
Cisco Talos Blog
C
Cybersecurity and Infrastructure Security Agency CISA
T
Tenable Blog
Blog — PlanetScale
Blog — PlanetScale
H
Hackread – Cybersecurity News, Data Breaches, AI and More
Vercel News
Vercel News
The Cloudflare Blog
T
Tailwind CSS Blog
N
Netflix TechBlog - Medium
I
Intezer
L
Lohrmann on Cybersecurity

VentureBeat

Anthropic says it hit a $30 billion revenue run rate after 'crazy' 80x growth OpenAI voice models get GPT-5-class reasoning AI agent identity: how to govern agentic AI in 6 stages Anthropic wants to own your agent's memory, evals, and orchestration — and that should make enterprises nervous Enterprise GPU utilization: why 95% of AI infrastructure spend is wasted Governance, not gatekeeping: How SAP brings enterprise‑grade safety to AI connectivity Anthropic introduces "dreaming," a system that lets AI agents learn from their own mistakes RL orchestration: how a 7B model routes tasks across GPT-5, Claude, and Gemini Meet ZAYA1-8B, a super efficient open reasoning model trained on AMD Instinct MI300 GPUs Anthropic Skill scanners passed every check. The malicious code rode in on a test file. Why AI breaks without context — and how to fix it Market research is too slow for the AI era, so Brox built 60,000 identical 'digital twins' of real people you can survey instantly, repeatedly The app store for robots has arrived: Hugging Face launches open-source Reachy Mini App Store with 200+ apps Scaling AI into production is forcing a rethink of enterprise infrastructure Miami startup Subquadratic claims 1,000x AI efficiency gain with SubQ model; researchers demand independent proof. GPT-5.5 Instant shows you what it remembered — just not all of it One command turns any open-source repo into an AI agent backdoor. OpenClaw proved no supply-chain scanner has a detection category for it AI agents are missing all the discussions your team is having. SageOX has an answer: agentic context infrastructure OpenAI turns its sold-out GPT-5.5 party into a monthlong Codex giveaway for 8,000 developers Inside AMEX’s agentic commerce stack: How intent contracts and single-use tokens enforce AI transactions Microsoft takes Agent 365 out of preview as shadow AI becomes an enterprise threat The RAG era is ending for agentic AI — a new compilation-stage knowledge layer is what comes next Salesforce Agentforce Operations fixes workflows breaking enterprise AI MCP command execution flaw: what security teams need to know The scaffolding era is over. LlamaIndex says context is the new moat xAI launches Grok 4.3 at an aggressively low price and a new, fast, powerful voice cloning suite Hidden IT problems are quietly creating risk, shadow IT, and lost productivity Alibaba's HDPO cuts AI agent tool overuse from 98% to 2% One tool call to rule them all? New open source Python tool Runpod Flash eliminates containers for faster AI dev Why OpenAI's 'goblin' problem matters — and how you can release the goblins on your own AI coding agents breached: attackers targeted credentials, not models | VentureBeat Writer launches AI agents that can act without prompts, taking on Amazon, Microsoft and Salesforce Netomi raises $110 million as Accenture and Adobe bet on AI for customer service Cheaper tokens, bigger bills: The new math of AI infrastructure Amazon’s OpenAI gambit signals a new phase in the cloud wars — one where exclusivity no longer applies Enterprise RAG rebuild: hybrid retrieval adoption tripled in Q1 2026 IBM launches Bob with multi-model routing and human checkpoints to turn AI coding into a secure production system AWS Quick's knowledge graph creates an orchestration blind spot Why enterprise GPU utilization is stuck at 5% — and why the fix makes it worse Definity embeds agents inside Spark pipelines to catch failures before they reach agentic AI systems How to build custom reasoning agents with a fraction of the compute American AI startup Poolside launches free, high-performing open model Laguna XS.2 for local agentic coding Mistral AI launches Workflows, a Temporal-powered orchestration engine already running millions of daily executions Microsoft and OpenAI gut their exclusive deal, freeing OpenAI to sell on AWS and Google Cloud Open source Xiaomi MiMo-V2.5 and V2.5-Pro are among the most efficient (and affordable) at agentic 'claw' tasks AI framework autonomously outperforms human-designed R&D baselines Why supply chains are the proving ground for automation‑led iPaaS RAG precision tuning can quietly cut retrieval accuracy by 40%, putting agentic pipelines at risk Enterprises are obsessing over model accuracy while ignoring the infrastructure layer where AI systems actually break. Monitoring LLM behavior: Drift, retries, and refusal patterns CVSS vulnerability triage: 5 failures, 5 fixes DeepSeek-V4 arrives with near state-of-the-art intelligence at fraction of the cost of Opus 4.7, GPT-5.5 85% of enterprises are running AI agents. Only 5% trust them enough to ship. AI synthetic audiences are already here and poised to upend the consulting industry Mystery solved: Anthropic reveals changes to Claude's harnesses and operating instructions likely caused degradation OpenAI's GPT-5.5 is here, and it's no potato: narrowly beats Anthropic's Claude Mythos Preview on Terminal-Bench 2.0 New startup BAND debuts agentic mesh with deterministic routing to govern multiple enterprise AI agents across model providers, channels OpenAI unveils Workspace Agents, a successor to custom GPTs for enterprises that can plug directly into Slack, Salesforce and more Google and AWS split the AI agent stack between control and execution Are you paying an AI ‘swarm tax’? Why single agents often beat complex systems OpenAI launches Privacy Filter, an open source, on-device data sanitization model that removes personal information from enterprise datasets Google doesn't pay the Nvidia tax. Its new TPUs explain why. Salesforce’s Agentforce Vibes 2.0 targets a hidden failure: context overload in AI agents Google’s Gemini can now run on a single air-gapped server — and vanish when you pull the plug The modern data stack was built for humans asking questions. Google just rebuilt its for agents taking action. Google’s new Deep Research and Deep Research Max agents can search the web and your private data Vercel breach exposes the OAuth gap most security teams cannot detect, scope or contain The AI governance mirage: Why 72% of enterprises don’t have the control and security they think they do OpenAI's ChatGPT Images 2.0 is here and it does multilingual text, full infographics, slides, maps, even manga — seemingly flawlessly Kimi K2.6 runs agents for days — and exposes the limits of enterprise orchestration What AI model should you use for revenue intelligence? Von says all the big ones, and it will automate mixing and matching for you Three AI coding agents leaked secrets through a single prompt injection. One vendor's system card predicted it Train-to-Test scaling explained: How to optimize your end-to-end AI compute budget for inference AI agent security maturity audit: enterprises funded stage one, stage-three threats arrived anyway Anthropic just launched Claude Design, an AI tool that turns prompts into prototypes and challenges Figma Should my enterprise AI agent do that? NanoClaw and Vercel launch easier agentic policy setting, approval dialogs for messaging apps Salesforce launches Headless 360 to turn its entire platform into infrastructure for AI agents Are we getting what we paid for? How to turn AI momentum into measurable value OpenAI debuts GPT-Rosalind, a new limited access model for life sciences, and broader Codex plugin on Github OpenAI drastically updates Codex desktop app to use all other apps on your computer, generate images, preview webpages Anthropic releases Claude Opus 4.7, narrowly retaking lead for most powerful generally available LLM AI lowered the cost of building software. Enterprise governance hasn’t caught up Microsoft patched a Copilot Studio prompt injection. The data exfiltrated anyway Frontier models are failing one in three production attempts — and getting harder to audit Meta researchers introduce 'hyperagents' to unlock self-improving AI for non-coding tasks We tested Anthropic’s redesigned Claude Code desktop app and 'Routines' -- here's what enterprises should know AI's next bottleneck isn't the models — it's whether agents can think together Adobe’s new Firefly AI Assistant wants to run Photoshop, Premiere, Illustrator and more from one prompt Traza raises $2.1 million led by Base10 to automate procurement workflows with AI Agentic coding at enterprise scale demands spec-driven development Designing the agentic AI enterprise for measurable performance Five signs data drift is already undermining your security models Your developers are already running AI locally: Why on-device inference is the CISO’s new blind spot AI agent credentials live in the same box as untrusted code. Two new architectures show where the blast radius actually stops. Intuit compressed months of tax code implementation into hours — and built a workflow any regulated-industry team can adapt OpenAI introduces ChatGPT Pro $100 tier with 5X usage limits for Codex compared to Plus Mythos autonomously exploited vulnerabilities that survived 27 years of human review. Security teams need a new detection playbook Claude, OpenClaw and the new reality: AI agents are here — and so is the chaos Goodbye, Llama? Meta launches new proprietary AI model Muse Spark — first since Superintelligence Labs' formation LLM-referred traffic converts at 30-40% — and most enterprises aren't optimizing for it
Sakana AI launches 'ultra deep research' agent for 100+ page reports in 8 hours
Carl Franzen · 2026-06-16 · via VentureBeat

Tokyo-based AI startup Sakana AI has officially launched its first commercial product, Sakana Marlin.

Billed as a "Virtual CSO" (Chief Strategy Officer), Marlin is an autonomous, B2B research agent that deliberately abandons the instantaneous text generation of modern chatbots in favor of deep, long-horizon reasoning.

What sets Marlin apart from the current ecosystem of AI tools is its temporal scale: instead of returning an answer in seconds, it runs continuous, self-governing reasoning loops for up to eight hours at a time to deliver deeply researched, well cited, 100-page strategy reports and executive slides. The company posted sample reports generated my Marlin on its product website here.

Available immediately via the company’s website with pricing starting at a pay-as-you-go tier, the platform is designed strictly for enterprise use—specifically targeting corporations, financial institutions, and think tanks.

The generative AI hype cycle has largely been defined by speed. For the past two years, the industry standard has been the ability to generate a poem, a line of code, or a surface-level summary in mere milliseconds. But the enterprise frontier is rapidly shifting from shallow, rapid generation to deep, methodical reasoning.

With Marlin, major businesses are no longer asking how fast an AI can answer, but how deeply it can think.

The Product: A Virtual CSO

What exactly is a business getting when they deploy Sakana Marlin? The workflow is fundamentally different from typical large language model (LLM) interactions. Rather than engaging in a tedious back-and-forth prompt engineering session, the user simply provides a core research topic. Following a brief initial exchange to sharpen the scope and direction of the investigation, the human steps away entirely.

For the next several hours, Marlin operates as a self-contained digital strategy team. It formulates its own initial hypotheses, navigates the web to gather data, cross-references sources to verify findings, and maps the causal dynamics within complex business environments. It is effectively searching for the "winning formula" within a sea of noise.

Think of it less like a search engine and more like a junior strategy consultant locked in a room with a whiteboard and an internet connection. You provide the strategic prompt in the morning, and by the end of the workday, the system delivers a comprehensive, professional-grade portfolio.

In Marlin's case, the final output is not a generic text blob; it is a structured set of strategic options, complete with executive summary slides, appendices, references, and a deeply researched report.

The company highlighted several real-world use cases to demonstrate Marlin's capacity for complex synthesis, including generating detailed resolution scenarios for a theoretical blockade of the Strait of Hormuz, mapping out the fragmented global AI regulation patchwork, and analyzing macroeconomic trends like the return of "bond vigilantes".

Sakana says Marlin relies on multiple AI models, but did not provide specific model names or providers. I've reached out on X to find out more and will update when I receive a repsonse.

The Engine of Long-Horizon Reasoning

Under the hood, Marlin is the commercial culmination of Sakana AI’s extensive laboratory breakthroughs over the past two years.

The product is powered by an exploration engine relying on Sakana's own prior research breakthrough, Adaptive Branching Monte Carlo Tree Search (AB-MCTS), and leverages frameworks derived from "The AI Scientist," an earlier Sakana AI research project featured in the journal Nature that successfully automated the scientific discovery process from ideation to peer review.

To understand how this works in practice, consider a real-world analogy: modern chess engines. When a computer plays chess, it doesn't just look at the board and guess; it plays out thousands of potential future moves, evaluating the strength of each resulting position before committing to an action.

Marlin’s AB-MCTS engine does something similar for research.

Inside the Engine: The Mechanics of AB-MCTS

The chronology of this technology traces back to June 2025, when Sakana AI first introduced the framework to the public alongside the research paper Wider or Deeper? Scaling LLM Inference-Time Compute with Adaptive Branching Tree Search.

At that time, to encourage developer experimentation with collective AI intelligence, the company released the underlying algorithm as an open-source software library called TreeQuest, distributed under the permissive Apache 2.0 license. This open-source milestone laid the technical foundation for what would eventually evolve into the proprietary, enterprise-grade Marlin product a year later.

Traditionally, when developers attempt to extract higher-quality reasoning from large language models, they rely on a brute-force method called "repeated sampling"—essentially running the model dozens of times in parallel and hoping one of the answers is correct. However, repeated sampling operates blindly; it cannot evaluate its own intermediate steps or pivot based on external feedback.

AB-MCTS replaces this paradigm with a principled, multi-turn approach driven by a Bayesian decision framework. As the AI constructs a strategy report, the system treats the research process as a branching tree of possibilities. At each node of the tree, the algorithm dynamically balances two distinct behaviors based on external feedback signals:

  • Going Wider (Exploration): Spawning entirely new, alternative hypotheses or candidate responses when the current path yields diminishing returns or unresolved contradictions.

  • Going Deeper (Exploitation): Methodically refining, auditing, and building upon an existing candidate solution that shows high strategic promise.

What transforms this from a laboratory experiment into a commercial engine is its extension into Multi-LLM AB-MCTS.

Sakana AI’s architecture introduces a critical third dimension to the search tree: the ability to dynamically choose which model to invoke for a specific sub-task, treating the industry’s leading frontier models as a plug-and-play collective intelligence network.

According to technical documentation published by the company, the engine can coordinate highly heterogeneous models—allowing an orchestration model to delegate initial ideation to one LLM, while utilizing a reasoning-heavy model to audit, verify, and correct intermediate errors generated earlier in the search tree.

By scaling up compute at inference time—leveraging the distinct "personalities" and strengths of multiple foundation models over thousands of automated cycles—AB-MCTS provides the mathematical guardrails Marlin requires. It ensures that the resulting 100-page strategy reports are not merely long-winded AI generations, but the highly vetted product of systemic, automated trial-and-error.

Licensing, Data, and Enterprise Implications

It is crucial to note that Sakana Marlin is distinctly not a general consumer tool; it is a commercial software-as-a-service (SaaS) offering restricted to corporate entities, organizations, and sole proprietors.

For enterprises, licensing and data handling terms are often the determining factors in software adoption. Unlike many consumer-grade AI tools that silently harvest user inputs and proprietary data to train future foundational models, Sakana Marlin operates under a strict, enterprise-grade data policy.

Neither Sakana AI nor its external AI service providers will use customer data or inputs for model training or fine-tuning unless the client provides explicit opt-in consent.

Even with consent, data is heavily processed to remove personally identifiable information. This closed-loop security is absolutely vital for companies handling sensitive M&A research, unreleased product strategies, or proprietary market analyses.

The commercial licensing is structured into tiered pricing models that reflect its enterprise nature:

  • Pay-as-you-go: Users can purchase credits on demand, with a single run costing 100 credits, and add-on credits priced at ¥98 ($0.61 USD) each.

  • Pro Plan: At ¥150,000 ($935.68 USD) per month, businesses receive 2,000 credits, bringing down the cost of add-on credits to ¥90 ($0.56 USD).

  • Team Plan: Geared toward larger departments, this ¥400,000 ($2,495.14 USD) per month tier includes 6,000 credits, lowering add-on costs to ¥85 ($0.53 USD) per credit.

  • Enterprise: Fully custom quotes with dedicated support and customized credit allocations.

Why Sakana Is Worth Watching

Sakana AI’s transition into a commercial enterprise powerhouse is rooted in the pedigree of its founders, who famously helped spark the current generative AI boom.

Formed in Tokyo in 2023, the startup was co-founded by Llion Jones—a co-author of Google’s seminal 2017 “Attention Is All You Need” paper who coined the term “transformer”—and David Ha, a former Google Brain researcher and head of research at Stability AI.

The decision to build a new laboratory outside the Silicon Valley bubble was a deliberate rejection of the current AI ecosystem. At a TED AI conference in late 2025, Jones candidly expressed that he was "absolutely sick" of transformers, warning that the intense pressure from investors and the hyper-fixation on scaling single, monolithic models had calcified the industry's creativity and blinded researchers to the next major breakthrough.

To break free from this "big company-itis," Jones and Ha structured Sakana AI around principles of biomimicry and evolutionary computing.

The company's name, derived from the Japanese word for fish, reflects its core technical philosophy: leveraging collective intelligence similar to schools of fish, ant colonies, or insect swarms. Rather than attempting to build one massive, do-it-all foundation model, Sakana’s research has consistently focused on deploying networks of smaller, specialized models that collaborate dynamically to adapt to complex environments.

This philosophy posits that by treating individual AI models as members of a "dream team" with complementary strengths, systems can achieve more robust and cost-effective reasoning than relying on sheer scale alone.

This nature-inspired approach quickly yielded dividends in rigorous, competitive testing. Sakana AI has made significant strides in "inference-time scaling"—allocating computational resources during the problem-solving phase to allow models to think, iterate, and refine their own answers over extended periods.

In early 2026, the company’s ALE-Agent took first place in the highly complex AtCoder Heuristic Contest (AHC058), a combinatorial optimization challenge, outperforming over 800 top-tier human programmers by autonomously rebuilding and testing hundreds of solutions over a four-hour window.

Similarly, Sakana introduced "RL Conductor," a small 7-billion-parameter model trained via reinforcement learning specifically to orchestrate and delegate tasks among a diverse pool of worker models—ranging from GPT-5 to Claude Sonnet 4—achieving state-of-the-art results on reasoning benchmarks at a fraction of traditional computing costs.

Sakana's rapid evolution from a disruptive research lab to a commercial software provider has attracted intense attention from global financial heavyweights.

By late 2025, the Tokyo-based startup secured a massive Series B funding round that pushed its post-money valuation past $2.6 billion, cementing its status as one of Japan’s most highly valued private tech companies. The firm boasts a sprawling roster of strategic investors, including early venture backers Khosla Ventures, Lux Capital, and New Enterprise Associates (NEA), alongside industry titans like Nvidia and Google.

As Sakana has expanded its focus toward mission-critical sectors like defense and finance, it has also drawn investments from major global banking institutions like Mitsubishi UFJ Financial Group (MUFG) and Citi, as well as enterprise tech giant Salesforce, positioning the startup to actively reshape corporate AI infrastructure from the ground up.

Community Reactions and Field Testing

Sakana AI’s shift toward commercial, long-horizon agents did not happen in a vacuum. The company ran a rigorous closed beta test beginning in April 2026, putting the tool in the hands of approximately 300 professionals across financial institutions, consulting firms, and think tanks. The feedback underscores a stark qualitative difference between standard generative chatbots and Marlin’s autonomous, fact-driven approach.

A senior consultant at a major Tokyo consulting firm noted that the tool "exceeded expectations by discovering angles we hadn't even imagined," praising its ability to match human comprehensiveness while stripping away human bias. Meanwhile, a cybersecurity division at a major Japanese IT system integrator lauded the system for providing "a highly convincing report driven by high-quality, primary research," rather than relying on recycled secondary sources.

On social media, the company’s announcement resonated with the broader tech community's growing appetite for autonomous agents.

As the AI industry matures, the value proposition is clearly shifting. Tools that act as fast, conversational encyclopedias are becoming commoditized. With Sakana Marlin, the focus moves entirely to separating the heavy lifting of thinking from the final act of deciding. By delegating the exhaustive mapping of causal dynamics to an agent capable of sustained reasoning, human executives are free to do what they do best: take action.