惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

F
Full Disclosure
Recorded Future
Recorded Future
T
Tenable Blog
S
Securelist
C
CERT Recently Published Vulnerability Notes
T
Threatpost
S
Schneier on Security
A
Arctic Wolf
The Hacker News
The Hacker News
C
CXSECURITY Database RSS Feed - CXSecurity.com
Know Your Adversary
Know Your Adversary
P
Privacy International News Feed
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
The Register - Security
The Register - Security
Cisco Talos Blog
Cisco Talos Blog
AWS News Blog
AWS News Blog
K
Kaspersky official blog
T
True Tiger Recordings
T
Threat Research - Cisco Blogs
V
Vulnerabilities – Threatpost
P
Palo Alto Networks Blog
T
The Exploit Database - CXSecurity.com
小众软件
小众软件
B
Blog
Cyber Security Advisories - MS-ISAC
Cyber Security Advisories - MS-ISAC
Microsoft Azure Blog
Microsoft Azure Blog
Cyberwarzone
Cyberwarzone
C
Cybersecurity and Infrastructure Security Agency CISA
T
Tor Project blog
Spread Privacy
Spread Privacy
Malwarebytes
Malwarebytes
P
Proofpoint News Feed
F
Fox-IT International blog
F
Fortinet All Blogs
P
Privacy & Cybersecurity Law Blog
G
GRAHAM CLULEY
量子位
Latest news
Latest news
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
博客园 - 叶小钗
Project Zero
Project Zero
T
Tailwind CSS Blog
N
Netflix TechBlog - Medium
Martin Fowler
Martin Fowler
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
I
Intezer
博客园_首页
腾讯CDC
H
Hackread – Cybersecurity News, Data Breaches, AI and More
D
Darknet – Hacking Tools, Hacker News & Cyber Security

cs.CL updates on arXiv.org

Omanic: Towards Step-wise Evaluation of Multi-hop Reasoning in Large Language Models Stop Listening to Me! How Multi-turn Conversations Can Degrade LLM Reliability Evaluating the Relevance of Uncertainty Estimators for LLM Hallucination MULTISEISMO: A Multimodal Seismic Dataset and Model for Cross-Modal Seismic Understanding NestedKV: Nested Memory Routing for Long-Context KV Cache Compression Granuscore: A Reference-Free Measure of Granularity for Text Analysis and Question Answering The Daily Dose: Workflow-Integrated Large Language Model Automation for Clinical Summarization and Trial Identification in Radiation Oncology Dissecting Multimodal In-Context Learning: Modality Asymmetries and Circuit Dynamics in modern Transformers Reliable Extraction of Clinical Follow-Up Instructions: A Hybrid Neural-Symbolic Pipeline Evi-Steer: Learning to Steer Biomedical Vision-Language Models through Efficient and Generalizable Evidential Tuning VERA-V: Variational Inference Framework for Jailbreaking Vision-Language Models LiPUP-MA: A Residential Experience-centric Multi-Agent Framework for Living-in-the-loop Participatory Urban Planning MUSE-Autoskill: Self-Evolving Agents via Skill Creation, Memory, Management, and Evaluation Alignment Tampering: How Reinforcement Learning from Human Feedback Is Exploited to Optimize Misaligned Biases The Strongest Teacher Is Not Always the Best Teacher: Student-Centric Answer Selection SPEAR: Code-Augmented Agentic Prompt Optimization Vectors Are Not Neutral: Sensitive-Information Inference from Exported LLM Representations in Summarization Learning GUI Grounding with Spatial Reasoning from Visual Feedback LATTE: Forecasting Peer Anchored Preference Trajectories for Personalized LLM Generation Elias in the Lighthouse, Again? Diagnosing Low Diversity in LLM Stories LaRe: Latent Refocusing for Multimodal Reasoning The Labyrinth and the Thread: Rethinking Regularizations in Sequential Knowledge Editing for Large Language Models QAM-W: Joint 2D Codebook Quantization for LLM Weights via Hadamard Rotation and Activation-Aware Scaling MicroSpec: Accelerating Speculative Decoding with Lightweight In-Context Vocabularies CroCo: Cross-Lingual Contrastive Preference Tuning on Self-Generations Representation-Aware Unlearning via Activation Signatures: From Suppression to Entity-Signature Erasure Learning When to Think While Listening in Large Audio-Language Models Energy-Gated Attention and Wavelet Positional Encoding: Complementary Inductive Biases for Transformer Attention BhashaSetu: A Data-Centric Approach to Low-Resource Machine Translation UltraCUA: A Foundation Model for Computer Use Agents with Hybrid Action It's Not Always Sycophancy: Measuring LLM Conformity as a Function of Epistemic Uncertainty Your Agents Are Aging Too: Agent Lifespan Engineering for Deployed Systems OmniInteract: Benchmarking Real-World Streaming Interaction for Real-Time Omnimodal Assistants Quality Without Usefulness: LLM-Generated XAI Narratives as Trust Heuristics Rather Than Decision Aids KARMA: Karma-Aligned Reward Model Adaptation Pop-Up Distractions Reveal Bag-of-Events Behavior in Video Large Language Models Targeted Remasking: Replacing Token Editing with Token-to-Mask Refinement in Discrete Diffusion Language Models Verus-SpecGym: An Agentic Environment for Evaluating Specification Autoformalization Annotator Positionality as Signal: Psychometric Weighting for Anti-Autistic Ableism Detection SetupX: Can LLM Agents Learn from Past Failures in Functionality-Correct Code Repository Setup? Hubness, Not Anisotropy, Drives Cross-Lingual Retrieval Asymmetry in Multilingual Embedding Models MobileMoE: Scaling On-Device Mixture of Experts MONA: Muon Optimizer with Nesterov Acceleration for Scalable Language Model Training AI evaluation may bias perceptions: The importance of context in interpreting academic writing Faithfulness Evaluation for Decoder-only LLM Attributions with Controlled Retained Information LLMs Are Already Good Tutors: Training-Free Prompt Optimization for Pedagogical Math Tutoring Why Prompt Optimization Works, and Why It Sometimes Doesn't: A Causal-Inspired Edit-Level Analysis It's Not the Capability: Harness Sensitivity Is Non-Monotone Across LLM Agent Tiers ScientistOne: Towards Human-Level Autonomous Research via Chain-of-Evidence SWE-Adept: An LLM-Based Agentic Framework for Deep Codebase Analysis and Structured Issue Resolution Strategic Persuasion with Trait-Conditioned Multi-Agent Systems for Iterative Legal Argumentation Model Unlearning Objectives Vary for Distinct Language Functions Rethinking the Trust Region in LLM Reinforcement Learning A Universal Cliff and a Design Fingerprint: Cross-Section Defect Detection Under LLM Orchestration An In-Vitro Study on Cross-Lingual Generalization in Language Models Probing the Knowledge Boundary: An Interactive Agentic Framework for Deep Knowledge Extraction OCR-Reasoning Benchmark: Unveiling the True Capabilities of MLLMs in Complex Text-Rich Image Reasoning PinPoint: Prompting with Informative Interior Points GUI-Libra: Training Native GUI Agents to Reason and Act with Action-aware Supervision and Partially Verifiable RL LEC: Linear Expectation Constraints for Selection-Conditioned Risk Control in Selective Prediction and Routing Systems SOLE-R1: Video-Language Reasoning as the Sole Reward for On-Robot Reinforcement Learning Reasoning, Code, or Both? How Large Language Models Handle Variations in Math Questions HiSpec: Hierarchical Speculative Decoding for LLMs RICE-PO: Turning Retrieval Interactions into Credit Signals for Reasoning Agents PRISM: A Multi-Dimensional Benchmark for Evaluating LLM Peer Reviewers Probing Minimalist Phase Structure in LLMs: What Universal Dependencies Cannot Represent Curation and Extraction of Drug-Related Entities from Reddit Platform Slide Deck Q&A Quality Assurance App: A Multi-Stage Pipeline for Pedagogical Question Generation Verilog-Evolve: Feedback-Driven and Skill-Evolving Verilog Generation In-Context Optimization for Retrieval-Augmented Generation: A Gradient-Descent Perspective Why LLMs Hallucinate on Structured Knowledge: A Mechanistic Analysis of Reasoning over Linearized Representations Towards Just-in-Time Adaptive Feedback: Enhancing Student Learning via Knowledge-Grounded LLM Conv-to-Bench: Evaluating Language Models Via User-Assistant Dialogues In Code Tasks Cultural Value Alignment Via Latent Activation Steering in Large Language Models Evidence Absence Is Not Evidence Insufficiency: Diagnosing NEI Construction Artifacts in Fact Verification Self-Verified Distillation: Your Language Model Is Secretly Its Own Synthetic Data Pipeline Bounded Path Context: A Controlled Study of Visible Path History in LLM-Based Knowledge Graph Question Answering Not All Tokens Matter Equally: Dynamic In-context Vector Distillation with Decisive-Token Supervision for Long-form Medical Report Generation FAB-Bench: A Framework for Adaptive RAG Benchmarking in Semiconductor Manufacturing Towards Error-Free EHRs: Reasoning-Intensive Consistency Verification Between Clinical Notes and Structured Tables in Electronic Health Records Chartographer: Counterfactual Chart Generation for Evaluating Vision-Language Models Conceptual Steganography The Need for an External Observer Formalizing the Sufficiency Gap: A Mathematical Extension of Mixture Identifiability and Contextual Grounding in Sequence Models Pretraining Data Exposure in Large Language Models: A Survey of Membership Inference, Data Contamination, and Security Implications Beyond Binary: Speech Representations Across the Cognitive Score Hierarchy FalAR: A Large-scale Speaker-Annotated European Portuguese Speech Corpus of Parliamentary Sessions Advancing Creative Physical Intelligence in Large Multimodal Models GraphDancer: Training LLMs to Explore and Reason over Graphs via Two-Stage Curriculum Post-Training Tracing Computation Density in LLMs Athena: Enhancing Multimodal Reasoning with Data-efficient Process Reward Models Latent Recurrent Transformer: Architecture Exploration, Training Strategies, and Scaling Behavior From Snippets to Semantics: Rethinking Evidence Granularity for Multilingual Fact Verification Guiding LLM Post-training Data Engineering with Model Internals from Sparse Autoencoders Rethinking the Multilingual Reasoning Gap with Layer Swap A Hybrid Vision-Language Architecture for Automated Defect Reasoning and Report Generation in Industrial Inspection Tool-Schema Compression Enables Agentic RAG Under Constrained Context Budgets UnityMAS-O: A General RL Optimization Framework for LLM-Based Multi-Agent Systems Memory Architectures for Multi-Turn Text-to-SQL: A Benchmark and Empirical Study Alignment Tuning for Large Language Models: A Data-Centric Lens on Alignment Data Pipelines LURE: Live-Usage Replay Evaluations for Reducing Evaluation Awareness
The MiniMax-M2 Series: Mini Activations Unleashing Max Real-World Intelligence
MiniMax, :, · 2026-05-27 · via cs.CL updates on arXiv.org

Authors:MiniMax: Aili Chen, Aonian Li, Baichuan Zhou, Bangwei Gong, Binyang Jiang, Boji Dan, Changqing Yu, Chao Wang, Cheng Ma, Cheng Zhong, Cheng Zhu, Chengjun Xiao, Chengyi Yang, Chengyu Du, Chenyang Zhang, Chi Zhang, Chuangyi Huang, Chunhao Zhang, Chunhui Du, Chunyu Zhao, Congchao Guo, Da Chen, Deming Ding, Dianjun Sun, Dongyu Zhang, Enhui Yang, Fei Yu, Guang Zheng, Guodong Zheng, Guohong Li, Haichao Zhu, Haigang Zhou, Haimo Zhang, Han Ding, Hao Zhang, Haohai Sun, Haolin Lyu, Haonan Lu, Haoyu Wang, Huajie Shi, Huiyang Li, Jiacheng Chen, Jian Zhang, Jiaqi Zhuang, Jiaren Cai, Jiaxin Pan, Jiayao Li, Jiayuan Song, Jichuan Zhang, Jie Wang, Jihao Gu, Jin Zhu, Jingwei Dong, Jingyang Li, Jingyu Zhang, Jingze Zhuang, Jinhao Tian, Jinli Liu, Jinyi Hu, Jun Tao, Jun Zhang, Junbin Ruan, Junhao Xu, Junjie Yan, Junteng Liu, Junxian He, Kang Xu, Ke Ji, Ke Yang, Kecheng Xiao, Keyu Duan, Keyu Li, Le Han, Letian Ruan, Li Yuan, Lianfei Yu, Liheng Feng, Lijie Mo, Lin Li, Lingye Bao, Lingyu Yang, Lingyuan Zhou, Loki, Lu Chen, Lunbin Ceng, Ming Li, Ming Zhong, Mingliang Tao, Mingyuan Chi, Mujie Lin, Nan Hu, Ningxin Chen, Peiyin Zhu, Peng Gao, Pengcheng Gao, Pengfei Li, Penglin Li, Pengyu Zhao, Qibin Ren

et al. (106 additional authors not shown)

View PDF HTML (experimental)

Abstract:We introduce the MiniMax-M2 series, a family of Mixture-of-Experts language models built around the principle that mini activations can unleash maximum real-world intelligence. The flagship M2 contains 229.9B total parameters with only 9.8B activated per token. Designed end-to-end for agentic deployment, the M2 series rests on three components: (i) agent-driven data pipelines producing large-scale, verifiable trajectories across agentic coding and agentic cowork, each grounded in an executable workspace and an artifact-aligned reward; (ii) Forge, a scalable agent-native RL system that adapts to long-horizon agent trajectories, paired with windowed-FIFO scheduling, prefix-tree merging, inference optimization, and a clean training-inference-agent decoupling that supports both white-box and black-box agents; (iii) the latest M2.7 checkpoint takes an early step toward self-evolution -- autonomously debugging training runs and modifying its own scaffold. Across M2 through M2.7, this combination translates a mini-activation footprint into frontier-tier performance on agentic coding, deep search, office-task, and reasoning benchmarks.
Comments: Technical Report. 35 pages, 10 figures, 4 tables
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
Cite as: arXiv:2605.26494 [cs.AI]
  (or arXiv:2605.26494v1 [cs.AI] for this version)
  https://doi.org/10.48550/arXiv.2605.26494

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Li Yuan [view email]
[v1] Tue, 26 May 2026 03:16:11 UTC (1,591 KB)