{ "id": "2212.14457", "version": "v1", "published": "2022-12-29T20:57:46.000Z", "updated": "2022-12-29T20:57:46.000Z", "title": "Bayesian Interpolation with Deep Linear Networks", "authors": [ "Boris Hanin", "Alexander Zlokapa" ], "categories": [ "stat.ML", "cs.LG", "math.PR" ], "abstract": "This article concerns Bayesian inference using deep linear networks with output dimension one. In the interpolating (zero noise) regime we show that with Gaussian weight priors and MSE negative log-likelihood loss both the predictive posterior and the Bayesian model evidence can be written in closed form in terms of a class of meromorphic special functions called Meijer-G functions. These results are non-asymptotic and hold for any training dataset, network depth, and hidden layer widths, giving exact solutions to Bayesian interpolation using a deep Gaussian process with a Euclidean covariance at each layer. Through novel asymptotic expansions of Meijer-G functions, a rich new picture of the role of depth emerges. Specifically, we find: ${\\bf \\text{The role of depth in extrapolation}}$: The posteriors in deep linear networks with data-independent priors are the same as in shallow networks with evidence maximizing data-dependent priors. In this sense, deep linear networks make provably optimal predictions. ${\\bf \\text{The role of depth in model selection}}$: Starting from data-agnostic priors, Bayesian model evidence in wide networks is only maximized at infinite depth. This gives a principled reason to prefer deeper networks (at least in the linear case). ${\\bf \\text{Scaling laws relating depth, width, and number of datapoints}}$: With data-agnostic priors, a novel notion of effective depth given by \\[\\#\\text{hidden layers}\\times\\frac{\\#\\text{training data}}{\\text{network width}}\\] determines the Bayesian posterior in wide linear networks, giving rigorous new scaling laws for generalization error.", "revisions": [ { "version": "v1", "updated": "2022-12-29T20:57:46.000Z" } ], "analyses": { "keywords": [ "deep linear networks", "bayesian interpolation", "bayesian model evidence", "meijer-g functions", "article concerns bayesian inference" ], "note": { "typesetting": "TeX", "pages": 0, "language": "en", "license": "arXiv", "status": "editable" } } }