惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

F
Full Disclosure
Recorded Future
Recorded Future
T
Tenable Blog
S
Securelist
C
CERT Recently Published Vulnerability Notes
T
Threatpost
S
Schneier on Security
A
Arctic Wolf
The Hacker News
The Hacker News
C
CXSECURITY Database RSS Feed - CXSecurity.com
Know Your Adversary
Know Your Adversary
P
Privacy International News Feed
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
The Register - Security
The Register - Security
Cisco Talos Blog
Cisco Talos Blog
AWS News Blog
AWS News Blog
K
Kaspersky official blog
T
True Tiger Recordings
T
Threat Research - Cisco Blogs
V
Vulnerabilities – Threatpost
P
Palo Alto Networks Blog
T
The Exploit Database - CXSecurity.com
小众软件
小众软件
B
Blog
Cyber Security Advisories - MS-ISAC
Cyber Security Advisories - MS-ISAC
Microsoft Azure Blog
Microsoft Azure Blog
Cyberwarzone
Cyberwarzone
C
Cybersecurity and Infrastructure Security Agency CISA
T
Tor Project blog
Spread Privacy
Spread Privacy
Malwarebytes
Malwarebytes
P
Proofpoint News Feed
F
Fox-IT International blog
F
Fortinet All Blogs
P
Privacy & Cybersecurity Law Blog
G
GRAHAM CLULEY
量子位
Latest news
Latest news
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
博客园 - 叶小钗
Project Zero
Project Zero
T
Tailwind CSS Blog
N
Netflix TechBlog - Medium
Martin Fowler
Martin Fowler
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
I
Intezer
博客园_首页
腾讯CDC
H
Hackread – Cybersecurity News, Data Breaches, AI and More
D
Darknet – Hacking Tools, Hacker News & Cyber Security

cs.CL updates on arXiv.org

Phonetic Modeling of Dialectal Variation in Vietnamese Speech The Tokenizer Tax Across 25 European Languages: Domain Invariance, Cross-Lingual Few-Shot Effects, and the Ukrainian Penalty World-State Transformations for Neuro-symbolic Interactive Storytelling Mimir: Large-scale Multilingual Concept Modeling M$^\star$: Every Task Deserves Its Own Memory Harness What Are We Actually Decoding? Source Attribution for Non-Invasive Brain-to-Language Retrieval When Reasoning Hurts: Source-Aware Evaluation of Frontier LLMs for Clinical SOAP Note Generation Translators as Invisible Teachers of AI: Copyright, Translation Memory, and the Political Economy of Linguistic Data Clarification Is Not Enough: Post-Clarification Answering Remains the Bottleneck in Multi-Turn QA Grammatically-Guided Sparse Attention for Efficient and Interpretable Transformers Discovering Lexical Gaps Using Embeddings from Multilingual LLMs Guarded Repair for Harm-Aware Post-hoc Replacement of LLM Mathematical Reasoning Unveil: Unified Visual-Textual Integration and Distillation for Multi-modal Document Retrieval Generating Legal Commentaries from Case Databases via Retrieval, Clustering, and Generation EchoDistill:Alignment Noisy-to-Clean Self-Distillation for Robust Audio LLMs Quantifying the Impact of Translation Errors on Multilingual LLM Evaluation NITP: Next Implicit Token Prediction for LLM Pre-training Faithful or Fabricated? A Causal Framework for Rationalization Bias in LLM Judges GroupTravelBench: Benchmarking LLM Agents on Multi-Person Travel Planning End-to-End Intracortical Speech Decoding from Neural Activity Knowing but Not Showing: LLMs Recognize Ambiguity but Rarely Ask Clarifying Questions Multi-Persona Debate System for Automated Scientific Hypothesis Generation An Interactive Paradigm for Deep Research Faithfulness Metrics Don't Measure Faithfulness: A Meta-Evaluation with Ground Truth Agent-ToM: Learning to Monitor Autonomous LLM Agents via Theory-of-Mind Reasoning Overview of the PsyDefDetect Shared Task at BioNLP 2026: Detecting Levels of Psychological Defense Mechanisms in Supportive Conversations SEAL: Synergistic Co-Evolution of Agents and Learning Environments Document Classification Pattern Recognition via Information Fusion: A Systematic Review of Multimodal and Multiview Representation Approaches Distinguishing Right from Wrong in Debates: Attribution Analysis of Chinese Harmful Memes A Multi-Probe Audit of Clinical-Interview Depression Detection Benchmarks TS-Skill: A Benchmark for Evaluating Analytical Skills in Time-Series Question Answering Repeated Sequences Reveal Gaps between Large Language Models and Natural Language Faithfulness as Information Flow: Evaluating and Training Faithful Chain-of-Thought Reasoning A general tensor-structured compression scheme for efficient large language models P1SCO: Social Dimensions from a Perspectivist Lens Raon-Speech Technical Report Exploring Profiles of Cognitive Distortions Associated with Mental Health Disorders Evidence-Linked Radiology Reporting: A Human-Supervised Reference Architecture for Structured Imaging Intelligence Better, Faster: Harnessing Self-Improvement in Large Reasoning Models By Their Fruits You Will Know Them: Comparing Formalizations of Law by the Decisions They Encode CUNY at CLPsych 2026: A Pipeline Approach to Classification and Summarization of Mental Health Changes Improving the Completeness and Comparability of Segment Disclosures: A Large Language Model Approach JudgmentBench: Comparing Rubric and Preference Evaluation for Quality Assessment Know You Before You Speak: User-State Modeling for LLM Personalization in Multi-Turn Conversation Found in Conversation: LLMs Teach Themselves to Close the Multi-Turn Gap DRInQ: Evaluating Conversational Implicature with Controlled Context Variation They Are Not the Same: Direct Causes Are Not Grounded Emotion Explanations MATO: Multi-objective Personalized Alignment with Test-time Optimization for Large Language Models Lngram: N-gram Conditional Memory in Latent Space ROC Analysis for Evaluating Translation Quality Estimation Systems ECHO: Terminal Agents Learn World Models for Free TRACE: A taxonomy-grounded synthetic dataset for teaching-program generation and session interpretation in Applied Behavior Analysis The Path Matters: Learning a Token-Commitment Policy for Diffusion Language Models StepGap: A Hybrid NLI-LLM Checker for Step-Level Evidence-Gap Detectionin Multi-Hop Question Answering READER: Reasoning-Enhanced AI-Generated Text Detection AstroMind: A High-Fidelity Benchmark for Spacecraft Behavior Reasoning Based on Large Language Models MultiHaluDet: Multilingual Hallucination Detection via LLM Hidden State Probing SLAP: Stratified Loss-based Pruning for On-Policy Data-Efficient Instruction Tuning HiMed: Incentivizing Hindi Reasoning in Medical LLMs CP-Agent: A Calibrated Risk-Controlled Agent for Feedback-Driven Competitive Programming Word Class Representations Spontaneously Emerge from Successor Representations Trained on Natural Language Improving Labeling Consistency with Detailed Constitutional Definitions and AI-Driven Evaluation Measuring the Depth of LLM Unlearning via Activation Patching Towards a Universal Causal Reasoner AI-Associated Lexical Shifts Across 34 Languages: Cross-Lingual Convergence and Diachronic Uptake in News Writing Who judges the judges? Governance from metrics: a runtime framework for continuous LLM compliance monitoring Language Bias in LVLMs: From In-Depth Analysis to Simple and Effective Mitigation H$^{2}$MT: Semantic Hierarchy-Aware Hierarchical Memory Transformer Re-defining Humor Data Objects for AI Humor Research DTO: a Differentiable Training Objective for Effective Counterfactual Story Rewriting Learning to Route Languages for Multilingual Policy Optimization SEP-Attack: A Simple and Effective Paradigm for Transfer-Based Textual Adversarial Attack Large Language Model Selection with Limited Annotations From Automation to Collaboration: Human-in-the-Loop Methods for Safe and Trustworthy NLP Inference Time Optimization with Confidence Dynamics Toxicity in Twitch Chats: An LLM-Based Analysis Across Gaming Communities Eureka: Intelligent Feature Engineering for Enterprise AI Cloud Resource Demand Prediction Extracting Training Data from Diffusion Language Models via Infilling Knowledge Graph-Driven Expert-Level Reasoning for Neuroscience Direct Preference Optimization for English-Mandarin Code-Switching Speech Recognition in Audio LLMs Locality Matters for Training-Free Audio Token Compression in Audio-Language Models ContextEcho: A Benchmark for Persona Drift in Long Agentic-Coding Sessions Temporal Concept Drift in Legal Judgment Prediction: Neural Baselines Across Three Epochs of Ukrainian Court Decisions Side-by-side Comparison Amplifies Dialect Bias in Language Models How Much Structure Do LLMs Need? Evaluating LLMs for Bibliometric Cluster Description QUEST: Training Frontier Deep Research Agents with Fully Synthetic Tasks WhenLoss: Diagnosing Write and Retrieval Bottlenecks in Long-Context Memory Systems Structure-Aware RAG: Structured Retrieval Augmented Generation from Noisy Data for Conversational Agents CSP-Atlas: Concept-Specific Neural Circuits in a Sparse Python Transformer TriVAL: A Tri-Validation Framework for Faithful Automatic Optimization Modeling MindAlign: Bridging EEG, Vision, and Language for Zero-Shot Visual Decoding AERIC: Anticipatory Hidden-State Monitoring for Implicit Harmful Dialogue Tool-Call Dependency Structure is Linearly Decodable in LLM Agent Residual Streams Decompose-and-Refine: Structured Legal Question Answering with Parametric Retrieval Teaching Through Analogies: A Modular Pipeline for Educational Analogy Generation Beyond the Target: From Imitation to Collaboration in Speculative Decoding Momentum Streams for Optimizer-Inspired Transformers STREAM: A Data-Centric Framework for Mining High-Value Task-Oriented Dialogues from Streaming Media LLM Agent Based Renewable Energy Forecasting Using Edge and IoT Data A Review of Solar Wind Weather and Grid Aware Decision Support Investigating the Interplay between Contextual and Parametric Chain-of-Thought Faithfulness under Optimization
Lying Is Just a Phase: The Hidden Alignment Transition in Language Model Scaling
Adil Amin · 2026-05-20 · via cs.CL updates on arXiv.org

View PDF HTML (experimental)

Abstract:Scaling laws predict loss from compute but not how capabilities interact. We measure the coupling between reasoning and truthfulness across 63 base models from 16 families and find a regime change invisible to loss curves: below a family-dependent critical scale $N_c$, capabilities anticorrelate; above it, they cooperate. $N_c \approx 3.5$B parameters [2.9B, 13.4B] (bootstrap 95\% CI), but model size is not the only variable that determines phase. Architecture, data curation, and training recipe each shift $N_c$ independently: curated training eliminated the coupling dip between Qwen generations (0.025 $\to$ 0.830 at matched scale), Gemma-4 at 4B achieves coupling 0.871, characteristic of 13B+ standard-trained models, through distillation and architectural innovation, and Phi at 1B matches web-trained coupling at 10B through data curation alone. Width normalization eliminates the anticorrelation across all tested families, supporting an output-projection bottleneck. Internally, 38 of 40 models show zero competing attention heads. A sparse-regression ODE cross-predicts held-out Llama-2 at 5.6\% error. The diagnostic requires no model internals -- only public benchmark scores across a model family. The cooperative regime extends to the frontier ($r = +0.72$, 34 models, 10 labs). A proof-of-concept intervention confirms the bottleneck is exploitable: adding a single truth-direction vector at the identified layer corrects 60\% of misaligned outputs in the tax phase with zero retraining -- a surgical, per-inference correction that requires no weight modification. Code, data, an open-source steering CLI for any open-weight model, and an interactive dashboard for phase diagnosis are released: this https URL.
Comments: 15 pages, 8 figures, 2 tables. Companion paper: "The Growing Pains of Frontier Models: When Leaderboards Stop Separating and What to Measure Next." ( this https URL). Code: this https URL. Dashboard: this https URL
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Cite as: arXiv:2605.18838 [cs.LG]
  (or arXiv:2605.18838v2 [cs.LG] for this version)
  https://doi.org/10.48550/arXiv.2605.18838

arXiv-issued DOI via DataCite

Submission history

From: Adil Amin [view email]
[v1] Wed, 13 May 2026 03:14:09 UTC (1,736 KB)
[v2] Sat, 23 May 2026 21:02:16 UTC (1,737 KB)