arXiv:2408.09310 Abstract | arXiv Analytics

arXiv:2408.09310 [cs.LG]Abstract References Reviews Resources

Narrowing the Focus: Learned Optimizers for Pretrained Models

Gus Kristiansen, Mark Sandler, Andrey Zhmoginov, Nolan Miller, Anirudh Goyal, Jihwan Lee, Max Vladymyrov

Published 2024-08-17Version 1

In modern deep learning, the models are learned by applying gradient updates using an optimizer, which transforms the updates based on various statistics. Optimizers are often hand-designed and tuning their hyperparameters is a big part of the training process. Learned optimizers have shown some initial promise, but are generally unsuccessful as a general optimization mechanism applicable to every problem. In this work we explore a different direction: instead of learning general optimizers, we instead specialize them to a specific training environment. We propose a novel optimizer technique that learns a layer-specific linear combination of update directions provided by a set of base optimizers, effectively adapting its strategy to the specific model and dataset. When evaluated on image classification tasks, this specialized optimizer significantly outperforms both traditional off-the-shelf methods such as Adam, as well as existing general learned optimizers. Moreover, it demonstrates robust generalization with respect to model initialization, evaluating on unseen datasets, and training durations beyond its meta-training horizon.

Categories: cs.LG

Keywords: pretrained models, image classification tasks, demonstrates robust generalization, general optimization mechanism, layer-specific linear combination

Related articles: Most relevant | Search more

arXiv:2303.17942 [cs.LG] (Published 2023-03-31)

Benchmarking FedAvg and FedCurv for Image Classification Tasks

Bruno Casella, Roberto Esposito, Carlo Cavazzoni, Marco Aldinucci

arXiv:2004.09466 [cs.LG] (Published 2020-04-20)

Counterfactual confounding adjustment for feature representations learned by deep models: with an application to image classification tasks

Elias Chaibub Neto

arXiv:2307.12226 [cs.LG] (Published 2023-07-23)

Geometry-Aware Adaptation for Pretrained Models

Nicholas Roberts, Xintong Li, Dyah Adila, Sonia Cromp, Tzu-Heng Huang, Jitian Zhao, Frederic Sala