arXiv:2002.03555 Abstract | arXiv Analytics

arXiv:2002.03555 [cs.LG]Abstract References Reviews Resources

Supervised Learning: No Loss No Cry

Published 2020-02-10Version 1

Supervised learning requires the specification of a loss function to minimise. While the theory of admissible losses from both a computational and statistical perspective is well-developed, these offer a panoply of different choices. In practice, this choice is typically made in an \emph{ad hoc} manner. In hopes of making this procedure more principled, the problem of \emph{learning the loss function} for a downstream task (e.g., classification) has garnered recent interest. However, works in this area have been generally empirical in nature. In this paper, we revisit the {\sc SLIsotron} algorithm of Kakade et al. (2011) through a novel lens, derive a generalisation based on Bregman divergences, and show how it provides a principled procedure for learning the loss. In detail, we cast {\sc SLIsotron} as learning a loss from a family of composite square losses. By interpreting this through the lens of \emph{proper losses}, we derive a generalisation of {\sc SLIsotron} based on Bregman divergences. The resulting {\sc BregmanTron} algorithm jointly learns the loss along with the classifier. It comes equipped with a simple guarantee of convergence for the loss it learns, and its set of possible outputs comes with a guarantee of agnostic approximability of Bayes rule. Experiments indicate that the {\sc BregmanTron} substantially outperforms the {\sc SLIsotron}, and that the loss it learns can be minimized by other algorithms for different tasks, thereby opening the interesting problem of \textit{loss transfer} between domains.

Categories: cs.LG, stat.ML

Subjects: I.2.6

Keywords: supervised learning, loss function, bregman divergences, composite square losses, simple guarantee

Related articles: Most relevant | Search more

arXiv:1901.09178 [cs.LG] (Published 2019-01-26)

A general model for plane-based clustering with loss function

Zhen Wang, Yuan-Hai Shao, Lan Bai, Chun-Na Li, Li-Ming Liu

arXiv:1903.02893 [cs.LG] (Published 2019-03-07)

Only sparsity based loss function for learning representations

Vivek Bakaraju, Kishore Reddy Konda

arXiv:2006.04751 [cs.LG] (Published 2020-06-08)

The Golden Ratio of Learning and Momentum

Stefan Jaeger

arXiv Analytics

arXiv:2002.03555 [cs.LG]Abstract References Reviews Resources

Supervised Learning: No Loss No Cry

Links

Toolbox

arXiv:2002.03555 [cs.LG]AbstractReferencesReviewsResources

Supervised Learning: No Loss No Cry

Links

Toolbox

arXiv:2002.03555 [cs.LG]Abstract References Reviews Resources