{
  "id": "2212.14457",
  "version": "v1",
  "published": "2022-12-29T20:57:46.000Z",
  "updated": "2022-12-29T20:57:46.000Z",
  "title": "Bayesian Interpolation with Deep Linear Networks",
  "authors": [
    "Boris Hanin",
    "Alexander Zlokapa"
  ],
  "categories": [
    "stat.ML",
    "cs.LG",
    "math.PR"
  ],
  "abstract": "This article concerns Bayesian inference using deep linear networks with output dimension one. In the interpolating (zero noise) regime we show that with Gaussian weight priors and MSE negative log-likelihood loss both the predictive posterior and the Bayesian model evidence can be written in closed form in terms of a class of meromorphic special functions called Meijer-G functions. These results are non-asymptotic and hold for any training dataset, network depth, and hidden layer widths, giving exact solutions to Bayesian interpolation using a deep Gaussian process with a Euclidean covariance at each layer. Through novel asymptotic expansions of Meijer-G functions, a rich new picture of the role of depth emerges. Specifically, we find: ${\\bf \\text{The role of depth in extrapolation}}$: The posteriors in deep linear networks with data-independent priors are the same as in shallow networks with evidence maximizing data-dependent priors. In this sense, deep linear networks make provably optimal predictions. ${\\bf \\text{The role of depth in model selection}}$: Starting from data-agnostic priors, Bayesian model evidence in wide networks is only maximized at infinite depth. This gives a principled reason to prefer deeper networks (at least in the linear case). ${\\bf \\text{Scaling laws relating depth, width, and number of datapoints}}$: With data-agnostic priors, a novel notion of effective depth given by \\[\\#\\text{hidden layers}\\times\\frac{\\#\\text{training data}}{\\text{network width}}\\] determines the Bayesian posterior in wide linear networks, giving rigorous new scaling laws for generalization error.",
  "revisions": [
    {
      "version": "v1",
      "updated": "2022-12-29T20:57:46.000Z"
    }
  ],
  "analyses": {
    "keywords": [
      "deep linear networks",
      "bayesian interpolation",
      "bayesian model evidence",
      "meijer-g functions",
      "article concerns bayesian inference"
    ],
    "note": {
      "typesetting": "TeX",
      "pages": 0,
      "language": "en",
      "license": "arXiv",
      "status": "editable"
    }
  }
}