
























Abstract:We develop a mean-field theory of dropout as a perturbation of critical signal propagation at the edge of chaos. Dropout shifts the perfect-alignment fixed point, making the depth scale for information propagation finite even at critical initialization. We derive critical and crossover scaling laws for correlation decay and establish that smooth activations and kinked, ReLU-like activations constitute distinct universality classes, with different critical exponents and a universal two-parameter scaling collapse in detuning and dropout strength. The distinction traces to the analytic structure of the correlation map: smooth activations admit a Taylor expansion near perfect alignment, while kinked activations develop a branch point with universal non-analyticity. As a corollary, the framework yields saturated dropout profiles under fixed budget; a rank-flow tie-breaker then selects front-loaded schedules, substantially reducing held-out test loss at no extra computational cost, with accuracy gains as a consistent secondary effect. We test the predictions in MLPs and Vision Transformers and discuss CNN/ResNet extensions.
| Comments: | Accepted at the 43rd International Conference on Machine Learning (ICML 2026). 36 pages, 11 figures |
| Subjects: | Machine Learning (cs.LG); Disordered Systems and Neural Networks (cond-mat.dis-nn); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML) |
| Cite as: | arXiv:2605.21648 [cs.LG] |
| (or arXiv:2605.21648v1 [cs.LG] for this version) | |
| https://doi.org/10.48550/arXiv.2605.21648 arXiv-issued DOI via DataCite (pending registration) |
From: Lucas Fernandez-Sarmiento [view email]
[v1]
Wed, 20 May 2026 19:00:02 UTC (641 KB)
此内容由惯性聚合(RSS阅读器)自动聚合整理,仅供阅读参考。 原文来自 — 版权归原作者所有。