arXiv:2306.03968 Abstract | arXiv Analytics

arXiv:2306.03968 [stat.ML]Abstract References Reviews Resources

Stochastic Marginal Likelihood Gradients using Neural Tangent Kernels

Alexander Immer, Tycho F. A. van der Ouderaa, Mark van der Wilk, Gunnar Rätsch, Bernhard Schölkopf

Published 2023-06-06Version 1

Selecting hyperparameters in deep learning greatly impacts its effectiveness but requires manual effort and expertise. Recent works show that Bayesian model selection with Laplace approximations can allow to optimize such hyperparameters just like standard neural network parameters using gradients and on the training data. However, estimating a single hyperparameter gradient requires a pass through the entire dataset, limiting the scalability of such algorithms. In this work, we overcome this issue by introducing lower bounds to the linearized Laplace approximation of the marginal likelihood. In contrast to previous estimators, these bounds are amenable to stochastic-gradient-based optimization and allow to trade off estimation accuracy against computational complexity. We derive them using the function-space form of the linearized Laplace, which can be estimated using the neural tangent kernel. Experimentally, we show that the estimators can significantly accelerate gradient-based hyperparameter optimization.

Comments: ICML 2023

Categories: stat.ML, cs.LG

Keywords: stochastic marginal likelihood gradients, neural tangent kernel, accelerate gradient-based hyperparameter optimization, laplace approximation, standard neural network parameters

Related articles: Most relevant | Search more

arXiv:2007.05864 [stat.ML] (Published 2020-07-11)

Bayesian Deep Ensembles via the Neural Tangent Kernel

Bobby He, Balaji Lakshminarayanan, Yee Whye Teh

arXiv:2302.01629 [stat.ML] (Published 2023-02-03)

Beyond the Universal Law of Robustness: Sharper Laws for Random Features and Neural Tangent Kernels

Simone Bombari, Shayan Kiyani, Marco Mondelli

arXiv:2107.12723 [stat.ML] (Published 2021-07-27)

Stability & Generalisation of Gradient Descent for Shallow Neural Networks without the Neural Tangent Kernel