惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

C
Comments on: Blog
S
Schneier on Security
Microsoft Azure Blog
Microsoft Azure Blog
T
Tor Project blog
V
Visual Studio Blog
C
CXSECURITY Database RSS Feed - CXSecurity.com
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
Spread Privacy
Spread Privacy
月光博客
月光博客
罗磊的独立博客
Cisco Talos Blog
Cisco Talos Blog
P
Privacy International News Feed
T
Tenable Blog
阮一峰的网络日志
阮一峰的网络日志
AWS News Blog
AWS News Blog
T
ThreatConnect
博客园 - 三生石上(FineUI控件)
Recorded Future
Recorded Future
Hugging Face - Blog
Hugging Face - Blog
T
Tailwind CSS Blog
博客园 - 叶小钗
CTFtime.org: upcoming CTF events
CTFtime.org: upcoming CTF events
A
Arctic Wolf
L
LINUX DO - 最新话题
美团技术团队
大猫的无限游戏
大猫的无限游戏
I
Intezer
博客园 - 司徒正美
酷 壳 – CoolShell
酷 壳 – CoolShell
量子位
小众软件
小众软件
T
Threatpost
V
V2EX
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
宝玉的分享
宝玉的分享
The Register - Security
The Register - Security
Project Zero
Project Zero
J
Java Code Geeks
Cyberwarzone
Cyberwarzone
IT之家
IT之家
MyScale Blog
MyScale Blog
T
Threat Research - Cisco Blogs
T
The Blog of Author Tim Ferriss
腾讯CDC
S
SegmentFault 最新的问题
F
Fox-IT International blog
S
Security Archives - TechRepublic
Last Week in AI
Last Week in AI
G
GRAHAM CLULEY
M
MIT News - Artificial intelligence

cs.CL updates on arXiv.org

Phonetic Modeling of Dialectal Variation in Vietnamese Speech The Tokenizer Tax Across 25 European Languages: Domain Invariance, Cross-Lingual Few-Shot Effects, and the Ukrainian Penalty World-State Transformations for Neuro-symbolic Interactive Storytelling Mimir: Large-scale Multilingual Concept Modeling M$^\star$: Every Task Deserves Its Own Memory Harness What Are We Actually Decoding? Source Attribution for Non-Invasive Brain-to-Language Retrieval When Reasoning Hurts: Source-Aware Evaluation of Frontier LLMs for Clinical SOAP Note Generation Translators as Invisible Teachers of AI: Copyright, Translation Memory, and the Political Economy of Linguistic Data Clarification Is Not Enough: Post-Clarification Answering Remains the Bottleneck in Multi-Turn QA Grammatically-Guided Sparse Attention for Efficient and Interpretable Transformers Discovering Lexical Gaps Using Embeddings from Multilingual LLMs Guarded Repair for Harm-Aware Post-hoc Replacement of LLM Mathematical Reasoning Unveil: Unified Visual-Textual Integration and Distillation for Multi-modal Document Retrieval Generating Legal Commentaries from Case Databases via Retrieval, Clustering, and Generation EchoDistill:Alignment Noisy-to-Clean Self-Distillation for Robust Audio LLMs Quantifying the Impact of Translation Errors on Multilingual LLM Evaluation NITP: Next Implicit Token Prediction for LLM Pre-training Faithful or Fabricated? A Causal Framework for Rationalization Bias in LLM Judges GroupTravelBench: Benchmarking LLM Agents on Multi-Person Travel Planning End-to-End Intracortical Speech Decoding from Neural Activity Knowing but Not Showing: LLMs Recognize Ambiguity but Rarely Ask Clarifying Questions Multi-Persona Debate System for Automated Scientific Hypothesis Generation An Interactive Paradigm for Deep Research Faithfulness Metrics Don't Measure Faithfulness: A Meta-Evaluation with Ground Truth Agent-ToM: Learning to Monitor Autonomous LLM Agents via Theory-of-Mind Reasoning Overview of the PsyDefDetect Shared Task at BioNLP 2026: Detecting Levels of Psychological Defense Mechanisms in Supportive Conversations SEAL: Synergistic Co-Evolution of Agents and Learning Environments Document Classification Pattern Recognition via Information Fusion: A Systematic Review of Multimodal and Multiview Representation Approaches Distinguishing Right from Wrong in Debates: Attribution Analysis of Chinese Harmful Memes A Multi-Probe Audit of Clinical-Interview Depression Detection Benchmarks TS-Skill: A Benchmark for Evaluating Analytical Skills in Time-Series Question Answering Repeated Sequences Reveal Gaps between Large Language Models and Natural Language Faithfulness as Information Flow: Evaluating and Training Faithful Chain-of-Thought Reasoning A general tensor-structured compression scheme for efficient large language models P1SCO: Social Dimensions from a Perspectivist Lens Raon-Speech Technical Report Exploring Profiles of Cognitive Distortions Associated with Mental Health Disorders Evidence-Linked Radiology Reporting: A Human-Supervised Reference Architecture for Structured Imaging Intelligence Better, Faster: Harnessing Self-Improvement in Large Reasoning Models By Their Fruits You Will Know Them: Comparing Formalizations of Law by the Decisions They Encode CUNY at CLPsych 2026: A Pipeline Approach to Classification and Summarization of Mental Health Changes Improving the Completeness and Comparability of Segment Disclosures: A Large Language Model Approach JudgmentBench: Comparing Rubric and Preference Evaluation for Quality Assessment Know You Before You Speak: User-State Modeling for LLM Personalization in Multi-Turn Conversation Found in Conversation: LLMs Teach Themselves to Close the Multi-Turn Gap DRInQ: Evaluating Conversational Implicature with Controlled Context Variation They Are Not the Same: Direct Causes Are Not Grounded Emotion Explanations MATO: Multi-objective Personalized Alignment with Test-time Optimization for Large Language Models Lngram: N-gram Conditional Memory in Latent Space ROC Analysis for Evaluating Translation Quality Estimation Systems ECHO: Terminal Agents Learn World Models for Free TRACE: A taxonomy-grounded synthetic dataset for teaching-program generation and session interpretation in Applied Behavior Analysis The Path Matters: Learning a Token-Commitment Policy for Diffusion Language Models StepGap: A Hybrid NLI-LLM Checker for Step-Level Evidence-Gap Detectionin Multi-Hop Question Answering READER: Reasoning-Enhanced AI-Generated Text Detection AstroMind: A High-Fidelity Benchmark for Spacecraft Behavior Reasoning Based on Large Language Models MultiHaluDet: Multilingual Hallucination Detection via LLM Hidden State Probing SLAP: Stratified Loss-based Pruning for On-Policy Data-Efficient Instruction Tuning HiMed: Incentivizing Hindi Reasoning in Medical LLMs CP-Agent: A Calibrated Risk-Controlled Agent for Feedback-Driven Competitive Programming Word Class Representations Spontaneously Emerge from Successor Representations Trained on Natural Language Improving Labeling Consistency with Detailed Constitutional Definitions and AI-Driven Evaluation Measuring the Depth of LLM Unlearning via Activation Patching Towards a Universal Causal Reasoner AI-Associated Lexical Shifts Across 34 Languages: Cross-Lingual Convergence and Diachronic Uptake in News Writing Who judges the judges? Governance from metrics: a runtime framework for continuous LLM compliance monitoring Language Bias in LVLMs: From In-Depth Analysis to Simple and Effective Mitigation H$^{2}$MT: Semantic Hierarchy-Aware Hierarchical Memory Transformer Re-defining Humor Data Objects for AI Humor Research DTO: a Differentiable Training Objective for Effective Counterfactual Story Rewriting Learning to Route Languages for Multilingual Policy Optimization SEP-Attack: A Simple and Effective Paradigm for Transfer-Based Textual Adversarial Attack Large Language Model Selection with Limited Annotations From Automation to Collaboration: Human-in-the-Loop Methods for Safe and Trustworthy NLP Inference Time Optimization with Confidence Dynamics Toxicity in Twitch Chats: An LLM-Based Analysis Across Gaming Communities Eureka: Intelligent Feature Engineering for Enterprise AI Cloud Resource Demand Prediction Extracting Training Data from Diffusion Language Models via Infilling Knowledge Graph-Driven Expert-Level Reasoning for Neuroscience Direct Preference Optimization for English-Mandarin Code-Switching Speech Recognition in Audio LLMs Locality Matters for Training-Free Audio Token Compression in Audio-Language Models ContextEcho: A Benchmark for Persona Drift in Long Agentic-Coding Sessions Temporal Concept Drift in Legal Judgment Prediction: Neural Baselines Across Three Epochs of Ukrainian Court Decisions Side-by-side Comparison Amplifies Dialect Bias in Language Models How Much Structure Do LLMs Need? Evaluating LLMs for Bibliometric Cluster Description QUEST: Training Frontier Deep Research Agents with Fully Synthetic Tasks WhenLoss: Diagnosing Write and Retrieval Bottlenecks in Long-Context Memory Systems Structure-Aware RAG: Structured Retrieval Augmented Generation from Noisy Data for Conversational Agents CSP-Atlas: Concept-Specific Neural Circuits in a Sparse Python Transformer TriVAL: A Tri-Validation Framework for Faithful Automatic Optimization Modeling MindAlign: Bridging EEG, Vision, and Language for Zero-Shot Visual Decoding AERIC: Anticipatory Hidden-State Monitoring for Implicit Harmful Dialogue Tool-Call Dependency Structure is Linearly Decodable in LLM Agent Residual Streams Decompose-and-Refine: Structured Legal Question Answering with Parametric Retrieval Teaching Through Analogies: A Modular Pipeline for Educational Analogy Generation Beyond the Target: From Imitation to Collaboration in Speculative Decoding Momentum Streams for Optimizer-Inspired Transformers STREAM: A Data-Centric Framework for Mining High-Value Task-Oriented Dialogues from Streaming Media LLM Agent Based Renewable Energy Forecasting Using Edge and IoT Data A Review of Solar Wind Weather and Grid Aware Decision Support Investigating the Interplay between Contextual and Parametric Chain-of-Thought Faithfulness under Optimization
Polymorphism Is Rotation: Operational Mechanistic Interpretability from a Two-Layer Transformer to Pythia-70m
Jordan F. Mc · 2026-05-26 · via cs.CL updates on arXiv.org

View PDF HTML (experimental)

Abstract:Independently trained transformers compute the same function in residual-stream bases that differ by a uniform random rotation on $\mathrm{SO}(d_{\mathrm{model}})$. We call this phenomenon polymorphism: same function, mutually unintelligible interior coordinates. One matrix multiplication per model pair removes it: an orthogonal Procrustes fit on a single batch of activations transfers sparse-autoencoder feature dictionaries and steering vectors between independently trained models, with no retraining.
The phenomenon is invisible to the standard SAE universality metric. Decoder-column cosine similarity matches across seeds at 98%, the SAE-universality headline number, while an SAE trained on one seed reconstructs another seed's activations at negative explained variance, worse than predicting the constant mean. The decoder columns align; the encoder reads from a rotated frame. A single Procrustes rotation $R$ restores reconstruction to within 0.025 EV of the within-seed ceiling at every internal site.
$R$ is Haar-distributed: $\|R - I\|_F$ matches the random-orthogonal prediction $\sqrt{2 d_{\mathrm{model}}}$ to 0.1% at $d_{\mathrm{model}} = 512$, and a Kolmogorov-Smirnov test of $R$'s eigenvalue spectrum against Haar $\mathrm{SO}(d_{\mathrm{model}})$ returns $p \approx 1.000$ pooled and per-pair. Diff-of-means steering vectors transfer in three regimes by alignment with $R$'s invariant subspace: clean when pinned by shared output weights, partial when overlapping the rotated subspace, inverted otherwise. With no shared I/O (Pythia), all three collapse to universally inverted. The same rotation account holds across training checkpoints within a single run.
Validated on a 104k-parameter Dyck-3 transformer and nine independently-trained Pythia-70m seeds on The Pile, via a pre-registered four-bar operational framework. Frontier-scale (10B+) replication remains open.
Comments: 26 pages, 4 figures, 40 references. Pre-registered four-bar framework; all numerical claims reproducible
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
ACM classes: I.2.6; I.2.7
Cite as: arXiv:2605.24577 [cs.LG]
  (or arXiv:2605.24577v1 [cs.LG] for this version)
  https://doi.org/10.48550/arXiv.2605.24577

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Jordan McCann [view email]
[v1] Sat, 23 May 2026 13:37:59 UTC (66 KB)