

























A well-conditioned Jacobian spectrum has a vital role in preventing exploding or vanishing gradients and speeding up learning of deep neural networks. Free probability theory helps us to understand and handle the Jacobian spectrum. We rigorously show almost sure asymptotic freeness of layer-wise Jacobians of deep neural networks as the wide limit. In particular, we treat the case that weights are initialized as Haar distributed orthogonal matrices.
此内容由惯性聚合(RSS阅读器)自动聚合整理,仅供阅读参考。 原文来自 — 版权归原作者所有。