惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

S
Schneier on Security
H
Hacker News: Front Page
Cyberwarzone
Cyberwarzone
NISL@THU
NISL@THU
GbyAI
GbyAI
Y
Y Combinator Blog
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
Know Your Adversary
Know Your Adversary
C
CERT Recently Published Vulnerability Notes
L
LINUX DO - 热门话题
Apple Machine Learning Research
Apple Machine Learning Research
量子位
F
Fortinet All Blogs
Last Week in AI
Last Week in AI
C
CXSECURITY Database RSS Feed - CXSecurity.com
G
Google Developers Blog
D
Darknet – Hacking Tools, Hacker News & Cyber Security
C
Cybersecurity and Infrastructure Security Agency CISA
Scott Helme
Scott Helme
Latest news
Latest news
雷峰网
雷峰网
Stack Overflow Blog
Stack Overflow Blog
Project Zero
Project Zero
The GitHub Blog
The GitHub Blog
Recent Announcements
Recent Announcements
M
MIT News - Artificial intelligence
P
Privacy & Cybersecurity Law Blog
美团技术团队
T
Tor Project blog
Security Latest
Security Latest
Hugging Face - Blog
Hugging Face - Blog
S
Security Archives - TechRepublic
N
News and Events Feed by Topic
C
Cisco Blogs
aimingoo的专栏
aimingoo的专栏
A
About on SuperTechFans
AI
AI
D
Docker
cs.CL updates on arXiv.org
cs.CL updates on arXiv.org
罗磊的独立博客
K
KPMG report finds enterprise disconnect between AI and its ROI | CIO
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
IT之家
IT之家
L
LINUX DO - 最新话题
博客园 - Franky
Google DeepMind News
Google DeepMind News
V
V2EX - 技术
P
Proofpoint News Feed
A
Arctic Wolf
Help Net Security
Help Net Security

MarkTechPost

A Coding Implementation of End-to-End Brain Decoding from MEG Signals Using NeuralSet and Deep Learning for Predicting Linguistic Features Meta Introduces Autodata: An Agentic Framework That Turns AI Models into Autonomous Data Scientists for High-Quality Training Data Creation Qwen AI Releases Qwen-Scope: An Open-Source Sparse AutoEncoders (SAE) Suite That Turns LLM Internal Features into Practical Development Tools A Coding Deep Dive into Agentic UI, Generative UI, State Synchronization, and Interrupt-Driven Approval Flows Moonshot AI Open-Sources FlashKDA: CUTLASS Kernels for Kimi Delta Attention with Variable-Length Batching and H20 Benchmarks Microsoft Research’s World-R1 Uses Flow-GRPO and 3D-Aware Rewards to Inject Geometric Consistency Into Wan 2.1 Without Architectural Changes A Coding Implementation on Pyright Type Checking Covering Generics, Protocols, Strict Mode, Type Narrowing, and Modern Python Typing IBM Releases Two Granite Speech 4.1 2B Models: Autoregressive ASR with Translation and Non-Autoregressive Editing for Fast Inference Top 10 KV Cache Compression Techniques for LLM Inference: Reducing Memory Overhead Across Eviction, Quantization, and Low-Rank Methods Qwen Team Releases FlashQLA: a High-Performance Linear Attention Kernel Library That Achieves Up to 3× Speedup on NVIDIA Hopper GPUs Step by Step Guide to Build a Complete PII Detection and Redaction Pipeline with OpenAI Privacy Filter Meta FAIR Releases NeuralSet: A Python Package for Neuro-AI That Supports fMRI, M/EEG, Spikes, and HuggingFace Embeddings smol-audio: A Colab-Friendly Notebook Collection for Fine-Tuning Whisper, Parakeet, Voxtral, Granite Speech, and Audio Flamingo 3 A Coding Implementation on Document Parsing Benchmarking with LlamaIndex ParseBench Using Python, Hugging Face, and Evaluation Metrics Poolside AI Introduces Laguna XS.2 and M.1: Agentic Coding Models Reaching 68.2% and 72.5% on SWE-bench Verified How to Build Traceable and Evaluated LLM Workflows Using Promptflow, Prompty, and OpenAI OpenAI Releases Privacy Filter: A 1.5B-Parameter Open-Source PII Redaction Model with 50M Active Parameters Top 10 Physical AI Models Powering Real-World Robots in 2026 How to Build a Lightweight Vision-Language-Action-Inspired Embodied Agent with Latent World Modeling and Model Predictive Control Meet Talkie-1930: A 13B Open-Weight LLM Trained on Pre-1931 English Text for Historical Reasoning and Generalization Research Build a Reinforcement Learning Powered Agent that Learns to Retrieve Relevant Long-Term Memories for Accurate LLM Question Answering OpenMOSS Releases MOSS-Audio: An Open-Source Foundation Model for Speech, Sound, Music, and Time-Aware Audio Reasoning Meta AI Releases Sapiens2: A High-Resolution Human-Centric Vision Model for Pose, Segmentation, Normals, Pointmap, and Albedo The LoRA Assumption That Breaks in Production How to Build a Fully Searchable AI Knowledge Base with OpenKB, OpenRouter, and Llama How to Build Smarter Multilingual Text Wrapping with BudouX Through Parsing, HTML Rendering, Model Introspection, and Toy Training Top 7 Benchmarks That Actually Matter for Agentic Reasoning in Large Language Models RAG Without Vectors: How PageIndex Retrieves by Reasoning A Coding Tutorial on Datashader on Rendering Massive Datasets with High-Performance Python Visual Analytics xAI Launches grok-voice-think-fast-1.0: Topping τ-voice Bench at 67.3%, Outperforming Gemini, GPT Realtime, and More A Coding Implementation on kvcached for Elastic KV Cache Memory, Bursty LLM Serving, and Multi-Model GPU Sharing Google DeepMind Introduces Vision Banana: An Instruction-Tuned Image Generator That Beats SAM 3 on Segmentation and Depth Anything V3 on Metric Depth Estimation Meet GitNexus: An Open-Source MCP-Native Knowledge Graph Engine That Gives Claude Code and Cursor Full Codebase Structural Awareness A Coding Implementation on Deepgram Python SDK for Transcription, Text-to-Speech, Async Audio Processing, and Text Intelligence A Coding Implementation on Microsoft’s OpenMementos with Trace Structure Analysis, Context Compression, and Fine-Tuning Data Preparation DeepSeek AI Releases DeepSeek-V4: Compressed Sparse Attention and Heavily Compressed Attention Enable One-Million-Token Contexts Google DeepMind Introduces Decoupled DiLoCo: An Asynchronous Training Architecture Achieving 88% Goodput Under High Hardware Failure Rates Mend Releases AI Security Governance Framework: Covering Asset Inventory, Risk Tiering, AI Supply Chain Security, and Maturity Model Mend.io Releases AI Security Governance Framework Covering Asset Inventory, Risk Tiering, AI Supply Chain Security, and Maturity Model OpenAI Releases GPT-5.5, a Fully Retrained Agentic Model That Scores 82.7% on Terminal-Bench 2.0 and 84.9% on GDPval A Coding Tutorial on OpenMythos on Recurrent-Depth Transformers with Depth Extrapolation, Adaptive Computation, and Mixture-of-Experts Routing Google Cloud AI Research Introduces ReasoningBank: A Memory Framework that Distills Reasoning Strategies from Agent Successes and Failures Xiaomi Releases MiMo-V2.5-Pro and MiMo-V2.5: Matching Frontier Model Benchmarks at Significantly Lower Token Cost How to Design a Production-Grade CAMEL Multi-Agent System with Planning, Tool Use, Self-Consistency, and Critique-Driven Refinement Alibaba Qwen Team Releases Qwen3.6-27B: A Dense Open-Weight Model Outperforming 397B MoE on Agentic Coding Benchmarks A Detailed Implementation on Equinox with JAX Native Modules, Filtered Transforms, Stateful Layers, and End-to-End Training Workflows Next Leap to Harness Engineering: JiuwenClaw Pioneers ‘Coordination Engineering’ Photon Releases Spectrum: An Open-Source TypeScript Framework that Deploys AI Agents Directly to iMessage, WhatsApp, and Telegram OpenAI Open-Sources Euphony: A Browser-Based Visualization Tool for Harmony Chat Data and Codex Session Logs Hugging Face Releases ml-intern: An Open-Source AI Agent that Automates the LLM Post-Training Workflow A Coding Implementation to Build a Conditional Bayesian Hyperparameter Optimization Pipeline with Hyperopt, TPE, and Early Stopping Google Introduces Simula: A Reasoning-First Framework for Generating Controllable, Scalable Synthetic Datasets Across Specialized AI Domains A Coding Implementation on Qwen 3.6-35B-A3B Covering Multimodal Inference, Thinking Control, Tool Calling, MoE Routing, RAG, and Session Persistence Moonshot AI Releases Kimi K2.6 with Long-Horizon Coding, Agent Swarm Scaling to 300 Sub-Agents and 4,000 Coordinated Steps A Coding Implementation on Microsoft’s Phi-4-Mini for Quantized Inference Reasoning Tool Use RAG and LoRA Fine-Tuning OpenAI Scales Trusted Access for Cyber Defense With GPT-5.4-Cyber: a Fine-Tuned Model Built for Verified Security Defenders Moonshot AI and Tsinghua Researchers Propose PrfaaS: A Cross-Datacenter KVCache Architecture that Rethinks How LLMs are Served at Scale Meet OpenMythos: An Open-Source PyTorch Reconstruction of Claude Mythos Where 770M Parameters Match a 1.3B Transformer How TabPFN Leverages In-Context Learning to Achieve Superior Accuracy on Tabular Datasets Compared to Random Forest and CatBoost A Coding Implementation to Build an AI-Powered File Type Detection and Security Analysis Pipeline with Magika and OpenAI NVIDIA Releases Ising: the First Open Quantum AI Model Family for Hybrid Quantum-Classical Systems xAI Launches Standalone Grok Speech-to-Text and Text-to-Speech APIs, Targeting Enterprise Voice Developers A Coding Tutorial for Running PrismML Bonsai 1-Bit LLM on CUDA with GGUF, Benchmarking, Chat, JSON, and RAG A Coding Guide for Property-Based Testing Using Hypothesis with Stateful, Differential, and Metamorphic Test Design Anthropic Releases Claude Opus 4.7: A Major Upgrade for Agentic Coding, High-Resolution Vision, and Long-Horizon Autonomous Tasks Google AI Releases Auto-Diagnose: An Large Language Model LLM-Based System to Diagnose Integration Test Failures at Scale A End-to-End Coding Guide to Running OpenAI GPT-OSS Open-Weight Models with Advanced Inference Workflows Top 19 AI Red Teaming Tools (2026): Secure Your ML Models A Coding Guide to Build a Production-Grade Background Task Processing System Using Huey with SQLite, Scheduling, Retries, Pipelines, and Concurrency Control Qwen Team Open-Sources Qwen3.6-35B-A3B: A Sparse MoE Vision-Language Model with 3B Active Parameters and Agentic Coding Capabilities OpenAI Launches GPT-Rosalind: Its First Life Sciences AI Model Built to Accelerate Drug Discovery and Genomics Research Building Transformer-Based NQS for Frustrated Spin Systems with NetKet UCSD and Together AI Research Introduces Parcae: A Stable Architecture for Looped Language Models That Achieves the Quality of a Transformer Twice the Size How to Build a Universal Long-Term Memory Layer for AI Agents Using Mem0 and OpenAI A Coding Implementation to Build Multi-Agent AI Systems with SmolAgents Using Code Execution, Tool Calling, and Dynamic Orchestration A Technical Deep Dive into the Essential Stages of Modern Large Language Model Training, Alignment, and Deployment Google AI Launches Gemini 3.1 Flash TTS: A New Benchmark in Expressive and Controllable AI Voice Google DeepMind Releases Gemini Robotics-ER 1.6: Bringing Enhanced Embodied Reasoning and Instrument Reading to Physical AI Google Launches ‘Skills’ in Chrome: Turning Reusable AI Prompts into One-Click Browser Workflows A Coding Implementation of Crawl4AI for Web Crawling, Markdown Generation, JavaScript Execution, and LLM-Based Structured Extraction TinyFish AI Releases Full Web Infrastructure Platform for AI Agents: Search, Fetch, Browser, and Agent Under One API Key NVIDIA and the University of Maryland Researchers Released Audio Flamingo Next (AF-Next): A Super Powerful and Open Large Audio-Language Model A Hands-On Coding Tutorial for Microsoft VibeVoice Covering Speaker-Aware ASR, Real-Time TTS, and Speech-to-Speech Pipelines Meta AI and KAUST Researchers Propose Neural Computers That Fold Computation, Memory, and I/O Into One Learned Model A Coding Implementation of MolmoAct for Depth-Aware Spatial Reasoning, Visual Trajectory Tracing, and Robotic Action Prediction MiniMax Just Open Sourced MiniMax M2.7: A Self-Evolving Agent Model that Scores 56.22% on SWE-Pro and 57.0% on Terminal Bench 2 Liquid AI Releases LFM2.5-VL-450M: a 450M-Parameter Vision-Language Model with Bounding Box Prediction, Multilingual Support, and Sub-250ms Edge Inference Researchers from MIT, NVIDIA, and Zhejiang University Propose TriAttention: A KV Cache Compression Method That Matches Full Attention at 2.5× Higher Throughput How to Build a Secure Local-First Agent Runtime with OpenClaw Gateway, Skills, and Controlled Tool Execution How Knowledge Distillation Compresses Ensemble Intelligence into a Single Deployable AI Model Alibaba’s Tongyi Lab Releases VimRAG: a Multimodal RAG Framework that Uses a Memory Graph to Navigate Massive Visual Contexts A Coding Guide to Markerless 3D Human Kinematics with Pose2Sim, RTMPose, and OpenSim NVIDIA Releases AITune: An Open-Source Inference Toolkit That Automatically Finds the Fastest Inference Backend for Any PyTorch Model Five AI Compute Architectures Every Engineer Should Know: CPUs, GPUs, TPUs, NPUs, and LPUs Compared An End-to-End Coding Guide to NVIDIA KVPress for Long-Context LLM Inference, KV Cache Compression, and Memory-Efficient Generation Meta Superintelligence Lab Releases Muse Spark: A Multimodal Reasoning Model With Thought Compression and Parallel Agents Sigmoid vs ReLU Activation Functions: The Inference Cost of Losing Geometric Context A Coding Guide to Build Advanced Document Intelligence Pipelines with Google LangExtract, OpenAI Models, Structured Extraction, and Interactive Visualization Google AI Research Introduces PaperOrchestra: A Multi-Agent Framework for Automated AI Research Paper Writing A Comprehensive Implementation Guide to ModelScope for Model Search, Inference, Fine-Tuning, Evaluation, and Export
Sakana AI Commercializes AB-MCTS in Sakana Marlin, an Enterprise Agent Generating Up to 100-Page Research Reports With Slides
https://www.facebook.com/MarkTechPost/ · 2026-06-16 · via MarkTechPost

Tokyo-based Sakana AI shipped its first commercial product ‘Sakana Marlin’ this week. Sakana team positions it as a Virtual CSO (Chief Strategy Officer). It is a B2B autonomous research agent built for enterprises.

Marlin does not answer in seconds like a chatbot. You give it one research topic. It then runs autonomously for up to about eight hours. Each run returns a long report plus a presentation slide deck. Sakana says a single session issues hundreds to thousands of LLM queries.

What is Sakana Marlin

Marlin is an enterprise research agent, not a chat assistant. You give it one topic or question. It then plans hypotheses, browses sources, and verifies findings on its own. It compresses weeks of strategy work into hours.

The deliverable is structured for decision-makers. The Japanese announcement describes reports of dozens of pages. The English announcement cites reports of up to roughly 100 pages. At a press hands-on, reports ran 60–100 pages and cited 60–80 sources. Each report includes a main body, references, and appendices. Presentation slides are generated using image-generation AI.

Sakana team refined Marlin through a closed beta in April 2026. Around 300 professionals tested it on real tasks during that beta. Those tasks spanned strategy formulation, market research, risk analysis, and competitive analysis. Sakana has also partnered with MUFG and taken strategic investment from Citigroup.

Inside AB-MCTS: Wider or Deeper

The backbone of Marlin is AB-MCTS, or Adaptive Branching Monte Carlo Tree Search. It comes from the Sakana’s past research “Wider or Deeper? Scaling LLM Inference-Time Compute with Adaptive Branching Tree Search.”

AB-MCTS treats reasoning as a tree-search problem. At each step the algorithm makes one decision. It can go wider by generating a new candidate answer. Or it can go deeper by refining a promising existing answer. Standard repeated sampling only goes wider in parallel, then hopes one answer is right.

A multi-LLM variant adds a second choice. It can route a step to a different model entirely. In Sakana’s reported ARC-AGI-2 experiments, this collaboration helped. Combining o4-mini, Gemini 2.5 Pro, and DeepSeek-R1 solved about 27.5% of tasks. The o4-mini model alone solved about 23%. Marlin applies the same adaptive search to long-horizon research.

The second key component for Marlin is workflow automation from Sakana’s AI Scientist project. That project demonstrated autonomous scientific discovery and was published in Nature.

Interactive demo: The embeddable widget (marlin-abmcts-demo.html) shows the “wider or deeper” decision live. Press Run and watch the tree grow. Greener nodes carry higher scores, and the best path is highlighted. Toggle “Multi-LLM” to see steps routed across different models.

AB-MCTS: “Wider or Deeper?” — interactive search

A simplified visual of Sakana AI’s Adaptive Branching Monte Carlo Tree Search. Each step the policy chooses to widen (new candidate) or deepen (refine a promising line).

Search state

Budget used0 / 24

Nodes (candidates)1

Best score0.00

Wider / Deeper0 / 0

Decision log

low score high score best path

© Marktechpost · Illustrative model of AB-MCTS (TreeQuest, Apache 2.0)

How Marlin Compares

Marlin competes on depth, not speed. Conventional deep-research tools answer in minutes to tens of minutes. Marlin deliberately spends hours to raise output quality. The competitor run times below are approximate and reported, not official figures.

ToolTypical run timeOutputPrimary user
Sakana MarlinUp to ~8 hoursReport (dozens to ~100 pages) + slidesEnterprise strategy teams
OpenAI Deep Research~Minutes to tens of minutesCited text reportGeneral and pro users
Perplexity Deep Research~A few minutesCited text answerGeneral users
Google Gemini Deep Research~MinutesCited text reportGeneral and workspace users

The trade-off is explicit. You wait longer and pay per run. In return you get deeper hypothesis testing and a finished deliverable. You can cancel a run anytime, but credits are still consumed.

Pricing

Sakana offers pay-as-you-go along with Pro, Team, and Enterprise tiers. Pay-as-you-go starts at 100 credits per run, at ¥98 per credit. Pro is ¥150,000 per month and includes 2,000 credits. Team is ¥400,000 per month and includes 6,000 credits. Enterprise pricing is custom, with dedicated support.

Use Cases, With Examples

Marlin suits high-stakes questions where research is the bottleneck. Here are concrete examples drawn from its target tasks.

  • Market entry: 'Assess Japan's stablecoin and tokenized-payments market after regulatory change.' Marlin maps drivers, risks, and structured options into a report.
  • Risk analysis: 'Model resolution scenarios for a Strait of Hormuz blockade.' It compares hypotheses, not just summaries, before drawing conclusions.
  • Competitive analysis: Profile three rivals and rank our positioning gaps. It returns slides ready for a strategy review.

Each example fits one prompt and one unattended run. A human still reviews the cited output before any decision.

Try the Engine Yourself: TreeQuest

You cannot self-host Marlin. But you can run its core algorithm today. Sakana open-sourced AB-MCTS as TreeQuest under the Apache 2.0 license. Install it, define a generate function, then run a fixed search budget.

import random
import treequest as tq

# Each node holds a user-defined state; score must be normalized to [0, 1].
def generate(parent_state):
    if parent_state is None:               # None means expand from the root
        new_state = "Initial draft"
    else:
        new_state = f"Refined: {parent_state}"
    score = random.random()                # swap this for an LLM-based score
    return new_state, score

algo = tq.ABMCTSA()                         # Adaptive Branching MCTS (variant A)
search_tree = algo.init_tree()

for _ in range(10):                         # generation budget of 10
    search_tree = algo.step(search_tree, {"generate": generate})

best_state, best_score = tq.top_k(search_tree, algo, k=1)[0]
print("BEST:", best_state, round(best_score, 3))

Swap the random score for an LLM judge to reproduce the real pattern. TreeQuest also ships multi-LLM search and checkpointing for long runs. Checkpointing matters because long sessions can hit API errors midway.

Strengths and Weaknesses

Strengths

  • Peer-reviewed foundations: AB-MCTS at NeurIPS and AI Scientist in Nature.
  • Finished deliverables, including references, appendices, and slides.
  • Adaptive compute spends effort on the most promising branches.
  • The open-source core (TreeQuest) lets AI researchers study the method.

Weaknesses

  • Long runtimes make iteration slow versus minute-scale research tools.
  • Automated reports can contain hard-to-spot errors that need human review.
  • Pricing and design target enterprises, not individual developers.
  • Marlin itself is closed; only the underlying algorithm is open.

Key Takeaways

  • Sakana Marlin runs autonomous research for up to about eight hours per task.
  • One run produces a report of dozens of pages, plus slides.
  • It builds on AB-MCTS (NeurIPS 2025 Spotlight) and AI Scientist workflows (Nature).
  • Entry pricing is pay-as-you-go: 100 credits per run at ¥98 per credit.
  • It targets finance, corporate strategy, consulting, and think-tank teams.

Sources

  • Sakana AI — Sakana Marlin release: https://sakana.ai/marlin-release/
  • Sakana AI — Sakana Marlin product page: https://sakana.ai/marlin/
  • Sakana AI — AB-MCTS research and TreeQuest: https://sakana.ai/ab-mcts/
  • SakanaAI/treequest (GitHub, Apache 2.0): https://github.com/SakanaAI/treequest