




























A new loss function is proposed for neural networks on classification tasks which extends the hinge loss by assigning gradients to its critical points. We will show that for a linear classifier on linearly separable data with fixed step size, the margin of this modified hinge loss converges to the $\ell_2$ max-margin at the rate of $\mathcal{O}( 1/t )$. This rate is fast when compared with the $\mathcal{O}(1/\log t)$ rate of exponential losses such as the logistic loss. Furthermore, empirical results suggest that this increased convergence speed carries over to ReLU networks.
此内容由惯性聚合(RSS阅读器)自动聚合整理,仅供阅读参考。 原文来自 — 版权归原作者所有。