惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

cs.CL updates on arXiv.org
cs.CL updates on arXiv.org
cs.AI updates on arXiv.org
cs.AI updates on arXiv.org
Hacker News - Newest:
Hacker News - Newest: "LLM"
S
Security Affairs
PCI Perspectives
PCI Perspectives
Google Online Security Blog
Google Online Security Blog
W
WeLiveSecurity
www.infosecurity-magazine.com
www.infosecurity-magazine.com
Recent Commits to openclaw:main
Recent Commits to openclaw:main
P
Privacy & Cybersecurity Law Blog
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
S
Security @ Cisco Blogs
Security Archives - TechRepublic
Security Archives - TechRepublic
Cyberwarzone
Cyberwarzone
L
Lohrmann on Cybersecurity
TaoSecurity Blog
TaoSecurity Blog
V
Visual Studio Blog
博客园 - 聂微东
Scott Helme
Scott Helme
博客园 - 【当耐特】
K
Kaspersky official blog
Security Latest
Security Latest
K
KPMG report finds enterprise disconnect between AI and its ROI | CIO
MyScale Blog
MyScale Blog
Schneier on Security
Schneier on Security
WordPress大学
WordPress大学
博客园 - 叶小钗
C
Check Point Blog
V2EX - 技术
V2EX - 技术
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
博客园 - Franky
T
Tor Project blog
Apple Machine Learning Research
Apple Machine Learning Research
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
腾讯CDC
雷峰网
雷峰网
博客园_首页
美团技术团队
Y
Y Combinator Blog
C
CERT Recently Published Vulnerability Notes
AWS News Blog
AWS News Blog
月光博客
月光博客
N
Netflix TechBlog - Medium
Last Week in AI
Last Week in AI
Recent Announcements
Recent Announcements
Google DeepMind News
Google DeepMind News
Help Net Security
Help Net Security
P
Proofpoint News Feed
MongoDB | Blog
MongoDB | Blog
C
Cybersecurity and Infrastructure Security Agency CISA

cs.IT updates on arXiv.org

Theoretical Limits of Language Model Alignment $f$-Divergence Regularized RLHF: Two Tales of Sampling and Unified Analyses A Unified Measure-Theoretic View of Diffusion, Score-Based, and Flow Matching Generative Models When Can Voting Help, Hurt, or Change Course? Exact Structure of Binary Test-Time Aggregation When Semantic Communication Meets Queueing: Cross-Layer Latency and Task Fidelity Optimization Convexity in Disguise: A Theoretical Framework for Nonconvex Low-Rank Matrix Estimation Conditional Diffusion Under Linear Constraints: Langevin Mixing and Information-Theoretic Guarantees Sharp Capacity Thresholds in Linear Associative Memory: From Winner-Take-All to Listwise Retrieval Expert Routing for Communication-Efficient MoE via Finite Expert Banks Contextual Memory-Enhanced Source Coding for Low-SNR Communications Realizable Bayes-Consistency for General Metric Losses Leveraging Code Automorphisms for Improved Syndrome-Based Neural Decoding A Hierarchical Sampling Framework for bounding the Generalization Error of Federated Learning Dueling DDQN-Based Adaptive Multi-Objective Handover Optimization for LEO Satellite Networks The Causal Description Gap: Information-Theoretic Separations Across Pearl's Hierarchy Optimization of CV-QKD Under Practical Constraints Benchmarking Wireless Representations: High-Dimensional vs. Compressed Embeddings for Efficiency and Robustness Real-Time Text Transmission via LLM-Based Entropy Coding over Fixed-Rate Channels SwiftChannel: Algorithm-Hardware Co-Design for Deep Learning-Based 5G Channel Estimation Evolving Token Communication with Parametric Memory Network Remote Action Generation: Remote Control with Minimal Communication The (Marginal) Value of a Search Ad: An Online Causal Framework for Repeated Second-price Auctions Stabilizing Private LASSO under Heterogeneous Covariates via Anisotropic Objective Perturbation Linear-Readout Floors and Threshold Recovery in Computation in Superposition Soft Graph Diffusion Transformer for MIMO Detection Hierarchical Federated Learning for Networked AI: From Communication Saving to Architecture-Aware Design Exponential families from a single KL identity MIFair: A Mutual-Information Framework for Intersectionality and Multiclass Fairness Diffusion-OAMP for Joint Image Compression and Wireless Transmission Decoupled Descent: Exact Test Error Tracking Via Approximate Message Passing Why Self-Supervised Encoders Want to Be Normal Statistical Channel Fingerprint Construction for Massive MIMO: A Unified Tensor Learning Framework Adaptive Transform Coding for Semantic Compression Lightweight Quantum Agent for Edge Systems: Joint PQC and NOMA Resource Allocation Rethinking KV Cache Eviction via a Unified Information-Theoretic Objective Information bottleneck for learning the phase space of dynamics from high-dimensional experimental data MEG-RAG: Quantifying Multi-modal Evidence Grounding for Evidence Selection in RAG Generalising maximum mean discrepancy: kernelised functional Bregman divergences Improving Robustness of Tabular Retrieval via Representational Stability Information-Theoretic Measures in AI: A Practical Decision Guide A Unified Fractional Regularization Framework for Sparse Recovery Shape of Memory: a Geometric Analysis of Machine Unlearning in Second-Order Optimizers The Exact Replica Threshold for Nonlinear Moments of Quantum States Semantic Error Correction and Decoding for Short Block Codes Null-Space Flow Matching for MIMO Channel Estimation in Latency-Constrained Systems Directional Confusions Reveal Divergent Inductive Biases Through Rate-Distortion Geometry in Human and Machine Vision MambaCSP: Hybrid-Attention State Space Models for Hardware-Efficient Channel State Prediction Amortized Vine Copulas for High-Dimensional Density and Information Estimation Decentralized Machine Learning with Centralized Performance Guarantees via Gibbs Algorithms Secure Rate-Distortion-Perception: A Randomized Distributed Function Computation Approach for Realism RateQuant: Optimal Mixed-Precision KV Cache Quantization via Rate-Distortion Theory FB-NLL: A Feature-Based Approach to Tackle Noisy Labels in Personalized Federated Learning Ultrametric OGP - parametric RDT \emph{symmetric} binary perceptron connection Watts-per-Intelligence Part II: Algorithmic Catalysis AirFM-DDA: Air-Interface Foundation Model in the Delay-Doppler-Angle Domain for AI-Native 6G Lossless Compression via Chained Lightweight Neural Predictors with Information Inheritance Regret Tail Characterization of Optimal Bandit Algorithms with Generic Rewards Exploiting Correlations in Federated Learning: Opportunities and Practical Limitations A Synonymous Variational Perspective on the Rate-Distortion-Perception Tradeoff Aerial Multi-Functional RIS in Fluid Antennas-Aided Full-Duplex Networks: A Self-Optimized Hybrid Deep Reinforcement Learning Approach InfoChess: A Game of Adversarial Inference and a Laboratory for Quantifiable Information Control Endogenous Information in Routing Games: Memory-Constrained Equilibria, Recall Braess Paradoxes, and Memory Design Beyond Fixed False Discovery Rates: Post-Hoc Conformal Selection with E-Variables LAWS: Learning from Actual Workloads Symbolically -- A Self-Certifying Parametrized Cache Architecture for Neural Inference, Robotics, and Edge Deployment The AI Telco Engineer: Toward Autonomous Discovery of Wireless Communications Algorithms Sequential KV Cache Compression via Probabilistic Language Tries: Beyond the Per-Vector Shannon Limit Diffusion Denoiser Achievable Analysis for Finite Blocklength Unsourced Random Access Joint Interference Detection and Identification via Adversarial Multi-task Learning Agentic AI-Based Joint Computing and Networking via Mixture of Experts and Large Language Models eOptShrinkQ: Near-Lossless KV Cache Compression Through Optimal Spectral Denoising and Quantization StateSMix: Online Lossless Compression via Mamba State Space Models and Sparse N-gram Context Mixing Polynomial-Time Optimal Group Selection via the Double-Commutator Eigenvalue Problem Algebraic Diversity: Group-Theoretic Spectral Estimation from Single Observations The Root Theorem of Context Engineering Continual Few-shot Adaptation for Synthetic Fingerprint Detection The Geometry of Knowing: From Possibilistic Ignorance to Probabilistic Certainty -- A Measure-Theoretic Framework for Epistemic Convergence A Decision-Theoretic Formalisation of Steganography With Applications to LLM Monitoring On the Rate-Distortion-Complexity Tradeoff for Semantic Communication Binary Flow Matching: Prediction-Loss Space Alignment for Robust Learning A Rational Account of Categorization Based on Information Theory Contextuality from Single-State Ontological Models: An Information-Theoretic Obstruction A Mixture of Experts Vision Transformer for High-Fidelity Surface Code Decoding On the Non-decoupling of Supervised Fine-tuning and Reinforcement Learning in Post-training Energy-Aware Routing to Large Reasoning Models Efficient Vector Symbolic Architectures from Histogram Recovery What Can Be Recovered Under Sparse Adversarial Corruption? Assumption-Free Theory for Linear Measurements Feedback Lunch: Learned Feedback Codes for Secure Communications On the optimization dynamics of RLVR: Gradient gap and step size thresholds Skip-It? Theoretical Conditions for Layer Skipping in Vision-Language Models Synthetic Counterfactual Labels for Efficient Conformal Counterfactual Inference Natural Image Classification via Quasi-Cyclic Graph Ensembles and Random-Bond Ising Models at the Nishimori Temperature Multimodal Remote Inference Let's Measure Information Step-by-Step: AI-Based Evaluation Beyond Vibes Best Agent Identification for General Game Playing Optimal Single-Policy Sample Complexity and Transient Coverage for Average-Reward Offline RL MLorc: Momentum Low-rank Compression for Memory Efficient Large Language Model Adaptation Biased Federated Learning under Wireless Heterogeneity MultiTok: Variable-Length Tokenization for Efficient LLMs Adapted from LZW Compression Anomaly Detection from a Tensor Train Perspective Semantic Variational Bayes Based on Semantic Information G Theory for Solving Latent Variables
Forget BIT, It is All about TOKEN: Towards Semantic Information Theory for LLMs
Bo Bai · 2025-11-03 · via cs.IT updates on arXiv.org

Despite the empirical successes of Large Language Models (LLMs), the prevailing paradigm is heuristic and experiment-driven, tethered to massive compute and data, while a first-principles theory remains absent. This treatise develops a Semantic Information Theory at the confluence of statistical physics, signal processing, and classical information theory, organized around a single paradigm shift: replacing the classical BIT - a microscopic substrate devoid of semantic content - with the macroscopic TOKEN as the atomic carrier of meaning and reasoning. Within this framework we recast attention and the Transformer as energy-based models, and interpret semantic embedding as vectorization on the semantic manifold. Modeling the LLM as a stateful channel with feedback, we adopt Massey's directed information as the native causal measure of autoregressive generation, from which we derive a *directed rate-distortion function for pre-training, a directed rate-reward function for RL-based post-training, and a sub-martingale account of inference-time semantic information flow. This machinery makes precise the identification of next-token prediction with Granger causal inference, and sharpens the limits of LLM reasoning against Pearl's Ladder of Causation - affirming that *whereas the BIT defined the Information Epoch, the TOKEN will define the AI Epoch.