惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

Google DeepMind News
Google DeepMind News
Exploit-DB.com RSS Feed
Exploit-DB.com RSS Feed
Security Latest
Security Latest
P
Palo Alto Networks Blog
AWS News Blog
AWS News Blog
NISL@THU
NISL@THU
T
Threatpost
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
Latest news
Latest news
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
WordPress大学
WordPress大学
J
Java Code Geeks
P
Privacy International News Feed
阮一峰的网络日志
阮一峰的网络日志
S
Schneier on Security
博客园 - 聂微东
Project Zero
Project Zero
美团技术团队
Recent Commits to openclaw:main
Recent Commits to openclaw:main
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
Scott Helme
Scott Helme
I
Intezer
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
H
Hacker News: Front Page
S
Security @ Cisco Blogs
博客园 - 司徒正美
O
OpenAI News
Last Week in AI
Last Week in AI
L
LINUX DO - 热门话题
酷 壳 – CoolShell
酷 壳 – CoolShell
SecWiki News
SecWiki News
月光博客
月光博客
S
Security Affairs
The GitHub Blog
The GitHub Blog
P
Privacy & Cybersecurity Law Blog
S
Secure Thoughts
V
V2EX
S
Securelist
F
Fortinet All Blogs
W
WeLiveSecurity
D
Docker
博客园 - 三生石上(FineUI控件)
Simon Willison's Weblog
Simon Willison's Weblog
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
cs.AI updates on arXiv.org
cs.AI updates on arXiv.org
C
Cyber Attacks, Cyber Crime and Cyber Security
V
Visual Studio Blog
www.infosecurity-magazine.com
www.infosecurity-magazine.com
Webroot Blog
Webroot Blog
Engineering at Meta
Engineering at Meta

cs.CL updates on arXiv.org

Playing Along: Learning a Double-Agent Defender for Belief Steering via Theory of Mind RPA-Check: A Multi-Stage Automated Framework for Evaluating Dynamic LLM-based Role-Playing Agents A Triadic Suffix Tokenization Scheme for Numerical Reasoning Hidden Measurement Error in LLM Pipelines Distorts Annotation, Evaluation, and Benchmarking Synthius-Mem: Brain-Inspired Hallucination-Resistant Persona Memory Achieving 94.4% Memory Accuracy and 99.6% Adversarial Robustness on LoCoMo Time is Not a Label: Continuous Phase Rotation for Temporal Knowledge Graphs and Agentic Memory NovBench: Evaluating Large Language Models on Academic Paper Novelty Assessment Policy Split: Incentivizing Dual-Mode Exploration in LLM Reinforcement with Dual-Mode Entropy Regularization METER: Evaluating Multi-Level Contextual Causal Reasoning in Large Language Models Think Before you Write: QA-Guided Reasoning for Character Descriptions in Books METRO: Towards Strategy Induction from Expert Dialogue Transcripts for Non-collaborative Dialogues Retrieval as Generation: A Unified Framework with Self-Triggered Information Planning Do LLMs Know Tool Irrelevance? Demystifying Structural Alignment Bias in Tool Invocations Enhancing Multimodal Large Language Models for Ancient Chinese Character Evolution Analysis via Glyph-Driven Fine-Tuning Polyglot Teachers: Evaluating Language Models for Multilingual Synthetic Data Generation Exploring Knowledge Conflicts for Faithful LLM Reasoning: Benchmark and Method CocoaBench: Evaluating Unified Digital Agents in the Wild MathAgent: Adversarial Evolution of Constraint Graphs for Mathematical Reasoning Data Synthesis Evaluating Memory Capability in Continuous Lifelog Scenario How Robust Are Large Language Models for Clinical Numeracy? An Empirical Study on Numerical Reasoning Abilities in Clinical Contexts Efficient Training for Cross-lingual Speech Language Models Shared Emotion Geometry Across Small Language Models: A Cross-Architecture Study of Representation, Behavior, and Methodological Confounds A Systematic Analysis of the Impact of Persona Steering on LLM Capabilities Uncertainty-Aware Web-Conditioned Scientific Fact-Checking When Valid Signals Fail: Regime Boundaries Between LLM Features and RL Trading Policies When Verification Fails: How Compositionally Infeasible Claims Escape Rejection Mem$^2$Evolve: Towards Self-Evolving Agents via Co-Evolutionary Capability Expansion and Experience Distillation AOP-Smart: A RAG-Enhanced Large Language Model Framework for Adverse Outcome Pathway Analysis OccuBench: Evaluating AI Agents on Real-World Professional Tasks via Language Environment Simulation Advancing Polish Language Modeling through Tokenizer Optimization in the Bielik v3 7B and 11B Series TInR: Exploring Tool-Internalized Reasoning in Large Language Models Do BERT Embeddings Encode Narrative Dimensions? A Token-Level Probing Analysis of Time, Space, Causality, and Character in Fiction Generating Multiple-Choice Knowledge Questions with Interpretable Difficulty Estimation using Knowledge Graphs and Large Language Models Deep-Reporter: Deep Research for Grounded Multimodal Long-Form Generation BlasBench: An Open Benchmark for Irish Speech Recognition Too Nice to Tell the Truth: Quantifying Agreeableness-Driven Sycophancy in Role-Playing Language Models Learning and Enforcing Context-Sensitive Control for LLMs HeceTokenizer: A Syllable-Based Tokenization Approach for Turkish Retrieval Efficient Process Reward Modeling via Contrastive Mutual Information Computational Lesions in Multilingual Language Models Separate Shared and Language-specific Brain Alignment Bridging Linguistic Gaps: Cross-Lingual Mapping in Pre-Training and Dataset for Enhanced Multilingual LLM Performance Knowing What to Stress: A Discourse-Conditioned Text-to-Speech Benchmark Early Decisions Matter: Proximity Bias and Initial Trajectory Shaping in Non-Autoregressive Diffusion Language Models LLMs Should Incorporate Explicit Mechanisms for Human Empathy ReFEree: Reference-Free and Fine-Grained Method for Evaluating Factual Consistency in Real-World Code Summarization Structure-Grounded Knowledge Retrieval via Code Dependencies for Multi-Step Data Reasoning From Query to Counsel: Structured Reasoning with a Multi-Agent Framework and Dataset for Legal Consultation Dynamic Adaptive Attention and Supervised Contrastive Learning: A Novel Hybrid Framework for Text Sentiment Classification EviCare: Enhancing Diagnosis Prediction with Deep Model-Guided Evidence for In-Context Reasoning NOSE: Neural Olfactory-Semantic Embedding with Tri-Modal Orthogonal Contrastive Learning Instruction Data Selection via Answer Divergence CodaRAG: Connecting the Dots with Associativity Inspired by Complementary Learning Turing or Cantor: That is the Question LASQ: A Low-resource Aspect-based Sentiment Quadruple Extraction Dataset NameBERT: Scaling Name-Based Nationality Classification with LLM-Augmented Open Academic Data BLUEmed: Retrieval-Augmented Multi-Agent Debate for Clinical Error Detection A Structured Clustering Approach for Inducing Media Narratives Adaptive Multi-Expert Reasoning via Difficulty-Aware Routing and Uncertainty-Guided Aggregation Comparative Analysis of Large Language Models in Healthcare CodeComp: Structural KV Cache Compression for Agentic Coding Relational Probing: LM-to-Graph Adaptation for Financial Prediction FAITH: Factuality Alignment through Integrating Trustworthiness and Honestness ODUTQA-MDC: A Task for Open-Domain Underspecified Tabular QA with Multi-turn Dialogue-based Clarification Nationality encoding in language model hidden states: Probing culturally differentiated representations in persona-conditioned academic text Think in Sentences: Explicit Sentence Boundaries Enhance Language Model's Capabilities Training-Free Cross-Lingual Dysarthria Severity Assessment via Phonological Subspace Analysis in Self-Supervised Speech Representations CircuitSynth: Reliable Synthetic Data Generation Who Wrote This Line? Evaluating the Detection of LLM-Generated Classical Chinese Poetry SEPTQ: A Simple and Effective Post-Training Quantization Paradigm for Large Language Models Why Supervised Fine-Tuning Fails to Learn: A Systematic Study of Incomplete Learning in Large Language Models Reason Only When Needed: Efficient Generative Reward Modeling via Model-Internal Uncertainty ASPIRin: Action Space Projection for Interactivity-Optimized Reinforcement Learning in Full-Duplex Speech Language Models Mirroring Minds: Asymmetric Linguistic Accommodation and Diagnostic Identity in ADHD and Autism Reddit Communities Computational Implementation of a Model of Category-Theoretic Metaphor Comprehension CoSToM:Causal-oriented Steering for Intrinsic Theory-of-Mind Alignment in Large Language Models Weird Generalization is Weirdly Brittle Human vs. Machine Deception: Distinguishing AI-Generated and Human-Written Fake News Using Ensemble Learning Should We be Pedantic About Reasoning Errors in Machine Translation? Simulating Organized Group Behavior: New Framework, Benchmark, and Analysis Spoiler Alert: Narrative Forecasting as a Metric for Tension in LLM Storytelling Claim2Vec: Embedding Fact-Check Claims for Multilingual Similarity and Clustering GIANTS: Generative Insight Anticipation from Scientific Literature Many-Tier Instruction Hierarchy in LLM Agents Interactive ASR: Towards Human-Like Interaction and Semantic Coherence Evaluation for Agentic Speech Recognition Regime-Conditional Retrieval: Theory and a Transferable Router for Two-Hop QA ASTRA: Adaptive Semantic Tree Reasoning Architecture for Complex Table Question Answering Revisiting the Capacity Gap in Chain-of-Thought Distillation from a Practical Perspective Dictionary-Aligned Concept Control for Safeguarding Multimodal LLMs $p1$: Better Prompt Optimization with Fewer Prompts Every Response Counts: Quantifying Uncertainty of LLM-based Multi-Agent Systems through Tensor Decomposition Skip-Connected Policy Optimization for Implicit Advantage PRAGMA: Revolut Foundation Model Linear Representations of Hierarchical Concepts in Language Models Generating High Quality Synthetic Data for Dutch Medical Conversations HumorGen: Cognitive Synergy for Humor Generation in Large Language Models via Persona-Based Distillation Toward Generalized Cross-Lingual Hateful Language Detection with Web-Scale Data and Ensemble LLM Annotations Self-Calibrating Language Models via Test-Time Discriminative Distillation Re-Mask and Redirect: Exploiting Denoising Irreversibility in Diffusion Language Models H-AdminSim: A Multi-Agent Simulator for Realistic Hospital Administrative Workflows with FHIR Integration Reasoning Models Will Sometimes Lie About Their Reasoning
DeepSeek-V4: Towards Highly Efficient Million-Token Context Intelligence
[Submitted on 26 Apr 2026] · 2026-06-19 · via cs.CL updates on arXiv.org

Authors:DeepSeek-AI, Anyi Xu, Bangcai Lin, Bing Xue, Bingxuan Wang, Bingzheng Xu, Bochao Wu, Bowei Zhang, Chaofan Lin, Chen Dong, Chenchen Ling, Chengda Lu, Chenggang Zhao, Chengqi Deng, Chengyu Hou, Chenhao Xu, Chenze Shao, Chong Ruan, Conner Sun, Damai Dai, Daya Guo, Dejian Yang, Deli Chen, Donghao Li, Dongjie Ji, Erhang Li, Fang Wei, Fangyun Lin, Fangzhou Yuan, Feiyu Xia, Fucong Dai, Guangbo Hao, Guanting Chen, Guoai Cao, Guolai Meng, Guowei Li, Han Yu, Han Zhang, Hanwei Xu, Hao Li, Haofen Liang, Haoling Zhang, Haoming Luo, Haoran Wei, Haotian Yuan, Haowei Zhang, Haowen Luo, Haoyu Chen, Haozhe Ji, Hengqing Zhang, Honghui Ding, Hongxuan Tang, Huanqi Cao, Huazuo Gao, Hui Qu, Hui Zeng, J Yang, JQ Zhu, Jia Luo, Jia Song, Jia Yu, Jialiang Huang, Jialu Cai, Jian Liang, Jiangting Zhou, Jiasheng Ye, Jiashi Li, Jiaxin Xu, Jiewen Hu, Jieyu Yang, Jin Chen, Jin Yan, Jingchang Chen, Jingli Zhou, Jingting Xiang, Jingyang Yuan, Jingyuan Cheng, Jingzi Zhou, Jinhua Zhu, Jiping Yu, Joseph Sun, Jun Ran, Junguang Jiang, Junjie Qiu, Junlong Li, Junmin Zheng, Junxiao Song, Kai Dong, Kaige Gao, Kang Guan, Kexing Zhou, Kezhao Huang, Kuai Yu, Lean Wang, Lecong Zhang, Lei Wang, Leyi Xia, Li Zhang, Liang Zhao, Lihua Guo

et al. (219 additional authors not shown)

View PDF HTML (experimental)

Abstract:We present a preview version of DeepSeek-V4 series, including two strong Mixture-of-Experts (MoE) language models -- DeepSeek-V4-Pro with 1.6T parameters (49B activated) and DeepSeek-V4-Flash with 284B parameters (13B activated) -- both supporting a context length of one million tokens. DeepSeek-V4 series incorporate several key upgrades in architecture and optimization: (1) a hybrid attention architecture that combines Compressed Sparse Attention (CSA) and Heavily Compressed Attention (HCA) to improve long-context efficiency; (2) Manifold-Constrained Hyper-Connections (mHC) that enhance conventional residual connections; (3) and the Muon optimizer for faster convergence and greater training stability. We pre-train both models on more than 32T diverse and high-quality tokens, followed by a comprehensive post-training pipeline that unlocks and further enhances their capabilities. DeepSeek-V4-Pro-Max, the maximum reasoning effort mode of DeepSeek-V4-Pro, redefines the state-of-the-art for open models, outperforming its predecessors in core tasks. Meanwhile, DeepSeek-V4 series are highly efficient in long-context scenarios. In the one-million-token context setting, DeepSeek-V4-Pro requires only 27% of single-token inference FLOPs and 10% of KV cache compared with DeepSeek-V3.2. This enables us to routinely support one-million-token contexts, thereby making long-horizon tasks and further test-time scaling more feasible. The model checkpoints are available at this https URL.

Submission history

From: Wenfeng Liang [view email]
[v1] Sun, 26 Apr 2026 14:49:33 UTC (2,854 KB)