{
  "id": "2109.14142",
  "version": "v4",
  "published": "2021-09-29T02:06:33.000Z",
  "updated": "2022-01-26T17:10:50.000Z",
  "title": "On the Provable Generalization of Recurrent Neural Networks",
  "authors": [
    "Lifu Wang",
    "Bo Shen",
    "Bo Hu",
    "Xing Cao"
  ],
  "comment": "Accepted to Neurips 2021",
  "categories": [
    "cs.LG",
    "stat.ML"
  ],
  "abstract": "Recurrent Neural Network (RNN) is a fundamental structure in deep learning. Recently, some works study the training process of over-parameterized neural networks, and show that over-parameterized networks can learn functions in some notable concept classes with a provable generalization error bound. In this paper, we analyze the training and generalization for RNNs with random initialization, and provide the following improvements over recent works: 1) For a RNN with input sequence $x=(X_1,X_2,...,X_L)$, previous works study to learn functions that are summation of $f(\\beta^T_lX_l)$ and require normalized conditions that $||X_l||\\leq\\epsilon$ with some very small $\\epsilon$ depending on the complexity of $f$. In this paper, using detailed analysis about the neural tangent kernel matrix, we prove a generalization error bound to learn such functions without normalized conditions and show that some notable concept classes are learnable with the numbers of iterations and samples scaling almost-polynomially in the input length $L$. 2) Moreover, we prove a novel result to learn N-variables functions of input sequence with the form $f(\\beta^T[X_{l_1},...,X_{l_N}])$, which do not belong to the \"additive\" concept class, i,e., the summation of function $f(X_l)$. And we show that when either $N$ or $l_0=\\max(l_1,..,l_N)-\\min(l_1,..,l_N)$ is small, $f(\\beta^T[X_{l_1},...,X_{l_N}])$ will be learnable with the number iterations and samples scaling almost-polynomially in the input length $L$.",
  "revisions": [
    {
      "version": "v4",
      "updated": "2022-01-26T17:10:50.000Z"
    }
  ],
  "analyses": {
    "keywords": [
      "recurrent neural network",
      "provable generalization",
      "generalization error bound",
      "notable concept classes",
      "learn functions"
    ],
    "note": {
      "typesetting": "TeX",
      "pages": 0,
      "language": "en",
      "license": "arXiv",
      "status": "editable"
    }
  }
}