Gradient descent follows the regularization path for general losses
Ziwei Ji, Miroslav Dudik, Robert Schapire, Matus Telgarsky
Subject areas: Loss functions, Classification, Convex optimization
Presented in: Session 3B, Session 3D
[Zoom link for poster in Session 3B], [Zoom link for poster in Session 3D]
Abstract:
Recent work across many machine learning disciplines has highlighted that standard descent methods, even without explicit regularization, do not merely minimize the training error, but also exhibit an \emph{implicit bias}. This bias is typically towards a certain regularized solution, and relies upon the details of the learning process, for instance the use of the cross-entropy loss.\n\nIn this work, we show that for empirical risk minimization over linear predictors with \emph{arbitrary} convex, strictly decreasing losses, if the risk does not attain its infimum, then the gradient-descent path and the \emph{algorithm-independent} regularization path converge to the same direction (whenever either converges to a direction). Using this result, we provide a justification for the widely-used exponentially-tailed losses (such as the exponential loss or the logistic loss): while this convergence to a direction for exponentially-tailed losses is necessarily to the maximum-margin direction, other losses such as polynomially-tailed losses may induce convergence to a direction with a poor margin.\n