On the Non-decoupling of Supervised Fine-tuning and Reinforcement Learning in Post-training - 惯性聚合

推荐订阅源

WordPress大学

博客园 - 司徒正美

酷壳 – CoolShell

Visual Studio Blog

Y Combinator Blog

freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More

Kaspersky official blog

The Exploit Database - CXSecurity.com

Cyber Security Advisories - MS-ISAC

Security Latest

The Register - Security

Fortinet All Blogs

CERT Recently Published Vulnerability Notes

LINUX DO - 热门话题

Privacy International News Feed

Privacy & Cybersecurity Law Blog

Help Net Security

KPMG report finds enterprise disconnect between AI and its ROI | CIO

Cyber Attacks, Cyber Crime and Cyber Security

Palo Alto Networks Blog

Full Disclosure

宝玉的分享

Simon Willison's Weblog

Lohrmann on Cybersecurity

Google DeepMind News

cs.CL updates on arXiv.org

Hacker News: Front Page

Know Your Adversary

PCI Perspectives

Hugging Face - Blog

Schneier on Security

Recent Announcements

Forbes - Security

Cisco Talos Blog

cs.IT updates on arXiv.org

Theoretical Limits of Language Model Alignment $f$-Divergence Regularized RLHF: Two Tales of Sampling and Unified Analyses A Unified Measure-Theoretic View of Diffusion, Score-Based, and Flow Matching Generative Models When Can Voting Help, Hurt, or Change Course? Exact Structure of Binary Test-Time Aggregation When Semantic Communication Meets Queueing: Cross-Layer Latency and Task Fidelity Optimization Convexity in Disguise: A Theoretical Framework for Nonconvex Low-Rank Matrix Estimation Conditional Diffusion Under Linear Constraints: Langevin Mixing and Information-Theoretic Guarantees Sharp Capacity Thresholds in Linear Associative Memory: From Winner-Take-All to Listwise Retrieval Expert Routing for Communication-Efficient MoE via Finite Expert Banks Contextual Memory-Enhanced Source Coding for Low-SNR Communications Realizable Bayes-Consistency for General Metric Losses Leveraging Code Automorphisms for Improved Syndrome-Based Neural Decoding A Hierarchical Sampling Framework for bounding the Generalization Error of Federated Learning Dueling DDQN-Based Adaptive Multi-Objective Handover Optimization for LEO Satellite Networks The Causal Description Gap: Information-Theoretic Separations Across Pearl's Hierarchy Optimization of CV-QKD Under Practical Constraints Benchmarking Wireless Representations: High-Dimensional vs. Compressed Embeddings for Efficiency and Robustness Real-Time Text Transmission via LLM-Based Entropy Coding over Fixed-Rate Channels SwiftChannel: Algorithm-Hardware Co-Design for Deep Learning-Based 5G Channel Estimation Evolving Token Communication with Parametric Memory Network Remote Action Generation: Remote Control with Minimal Communication The (Marginal) Value of a Search Ad: An Online Causal Framework for Repeated Second-price Auctions Stabilizing Private LASSO under Heterogeneous Covariates via Anisotropic Objective Perturbation Linear-Readout Floors and Threshold Recovery in Computation in Superposition Soft Graph Diffusion Transformer for MIMO Detection Hierarchical Federated Learning for Networked AI: From Communication Saving to Architecture-Aware Design Exponential families from a single KL identity MIFair: A Mutual-Information Framework for Intersectionality and Multiclass Fairness Diffusion-OAMP for Joint Image Compression and Wireless Transmission Decoupled Descent: Exact Test Error Tracking Via Approximate Message Passing Why Self-Supervised Encoders Want to Be Normal Statistical Channel Fingerprint Construction for Massive MIMO: A Unified Tensor Learning Framework Adaptive Transform Coding for Semantic Compression Lightweight Quantum Agent for Edge Systems: Joint PQC and NOMA Resource Allocation Rethinking KV Cache Eviction via a Unified Information-Theoretic Objective Information bottleneck for learning the phase space of dynamics from high-dimensional experimental data MEG-RAG: Quantifying Multi-modal Evidence Grounding for Evidence Selection in RAG Generalising maximum mean discrepancy: kernelised functional Bregman divergences Improving Robustness of Tabular Retrieval via Representational Stability Information-Theoretic Measures in AI: A Practical Decision Guide A Unified Fractional Regularization Framework for Sparse Recovery Shape of Memory: a Geometric Analysis of Machine Unlearning in Second-Order Optimizers The Exact Replica Threshold for Nonlinear Moments of Quantum States Semantic Error Correction and Decoding for Short Block Codes Null-Space Flow Matching for MIMO Channel Estimation in Latency-Constrained Systems Directional Confusions Reveal Divergent Inductive Biases Through Rate-Distortion Geometry in Human and Machine Vision MambaCSP: Hybrid-Attention State Space Models for Hardware-Efficient Channel State Prediction Amortized Vine Copulas for High-Dimensional Density and Information Estimation Decentralized Machine Learning with Centralized Performance Guarantees via Gibbs Algorithms Secure Rate-Distortion-Perception: A Randomized Distributed Function Computation Approach for Realism RateQuant: Optimal Mixed-Precision KV Cache Quantization via Rate-Distortion Theory FB-NLL: A Feature-Based Approach to Tackle Noisy Labels in Personalized Federated Learning Ultrametric OGP - parametric RDT \emph{symmetric} binary perceptron connection Watts-per-Intelligence Part II: Algorithmic Catalysis AirFM-DDA: Air-Interface Foundation Model in the Delay-Doppler-Angle Domain for AI-Native 6G Lossless Compression via Chained Lightweight Neural Predictors with Information Inheritance Regret Tail Characterization of Optimal Bandit Algorithms with Generic Rewards Exploiting Correlations in Federated Learning: Opportunities and Practical Limitations A Synonymous Variational Perspective on the Rate-Distortion-Perception Tradeoff Aerial Multi-Functional RIS in Fluid Antennas-Aided Full-Duplex Networks: A Self-Optimized Hybrid Deep Reinforcement Learning Approach InfoChess: A Game of Adversarial Inference and a Laboratory for Quantifiable Information Control Endogenous Information in Routing Games: Memory-Constrained Equilibria, Recall Braess Paradoxes, and Memory Design Beyond Fixed False Discovery Rates: Post-Hoc Conformal Selection with E-Variables LAWS: Learning from Actual Workloads Symbolically -- A Self-Certifying Parametrized Cache Architecture for Neural Inference, Robotics, and Edge Deployment The AI Telco Engineer: Toward Autonomous Discovery of Wireless Communications Algorithms Sequential KV Cache Compression via Probabilistic Language Tries: Beyond the Per-Vector Shannon Limit Diffusion Denoiser Achievable Analysis for Finite Blocklength Unsourced Random Access Joint Interference Detection and Identification via Adversarial Multi-task Learning Agentic AI-Based Joint Computing and Networking via Mixture of Experts and Large Language Models eOptShrinkQ: Near-Lossless KV Cache Compression Through Optimal Spectral Denoising and Quantization StateSMix: Online Lossless Compression via Mamba State Space Models and Sparse N-gram Context Mixing Polynomial-Time Optimal Group Selection via the Double-Commutator Eigenvalue Problem Algebraic Diversity: Group-Theoretic Spectral Estimation from Single Observations The Root Theorem of Context Engineering Continual Few-shot Adaptation for Synthetic Fingerprint Detection The Geometry of Knowing: From Possibilistic Ignorance to Probabilistic Certainty -- A Measure-Theoretic Framework for Epistemic Convergence A Decision-Theoretic Formalisation of Steganography With Applications to LLM Monitoring On the Rate-Distortion-Complexity Tradeoff for Semantic Communication Binary Flow Matching: Prediction-Loss Space Alignment for Robust Learning A Rational Account of Categorization Based on Information Theory Contextuality from Single-State Ontological Models: An Information-Theoretic Obstruction A Mixture of Experts Vision Transformer for High-Fidelity Surface Code Decoding Energy-Aware Routing to Large Reasoning Models Efficient Vector Symbolic Architectures from Histogram Recovery Forget BIT, It is All about TOKEN: Towards Semantic Information Theory for LLMs What Can Be Recovered Under Sparse Adversarial Corruption? Assumption-Free Theory for Linear Measurements Feedback Lunch: Learned Feedback Codes for Secure Communications On the optimization dynamics of RLVR: Gradient gap and step size thresholds Skip-It? Theoretical Conditions for Layer Skipping in Vision-Language Models Synthetic Counterfactual Labels for Efficient Conformal Counterfactual Inference Natural Image Classification via Quasi-Cyclic Graph Ensembles and Random-Bond Ising Models at the Nishimori Temperature Multimodal Remote Inference Let's Measure Information Step-by-Step: AI-Based Evaluation Beyond Vibes Best Agent Identification for General Game Playing Optimal Single-Policy Sample Complexity and Transient Coverage for Average-Reward Offline RL MLorc: Momentum Low-rank Compression for Memory Efficient Large Language Model Adaptation Biased Federated Learning under Wireless Heterogeneity MultiTok: Variable-Length Tokenization for Efficient LLMs Adapted from LZW Compression Anomaly Detection from a Tensor Train Perspective Semantic Variational Bayes Based on Semantic Information G Theory for Solving Latent Variables

On the Non-decoupling of Supervised Fine-tuning and Reinforcement Learning in Post-training

Xueyan Niu, Bo Bai, Wei Han, Weixi Zhang · 2026-01-12 · via cs.IT updates on arXiv.org

此内容由惯性聚合(RSS阅读器)自动聚合整理，仅供阅读参考。原文来自 — 版权归原作者所有。