{ "id": "2409.03703", "version": "v1", "published": "2024-09-05T16:59:56.000Z", "updated": "2024-09-05T16:59:56.000Z", "title": "Iterative thresholding for non-linear learning in the strong $\\varepsilon$-contamination model", "authors": [ "Arvind Rathnashyam", "Alex Gittens" ], "comment": "35 pages", "categories": [ "stat.ML", "cs.LG" ], "abstract": "We derive approximation bounds for learning single neuron models using thresholded gradient descent when both the labels and the covariates are possibly corrupted adversarially. We assume the data follows the model $y = \\sigma(\\mathbf{w}^{*} \\cdot \\mathbf{x}) + \\xi,$ where $\\sigma$ is a nonlinear activation function, the noise $\\xi$ is Gaussian, and the covariate vector $\\mathbf{x}$ is sampled from a sub-Gaussian distribution. We study sigmoidal, leaky-ReLU, and ReLU activation functions and derive a $O(\\nu\\sqrt{\\epsilon\\log(1/\\epsilon)})$ approximation bound in $\\ell_{2}$-norm, with sample complexity $O(d/\\epsilon)$ and failure probability $e^{-\\Omega(d)}$. We also study the linear regression problem, where $\\sigma(\\mathbf{x}) = \\mathbf{x}$. We derive a $O(\\nu\\epsilon\\log(1/\\epsilon))$ approximation bound, improving upon the previous $O(\\nu)$ approximation bounds for the gradient-descent based iterative thresholding algorithms of Bhatia et al. (NeurIPS 2015) and Shen and Sanghavi (ICML 2019). Our algorithm has a $O(\\textrm{polylog}(N,d)\\log(R/\\epsilon))$ runtime complexity when $\\|\\mathbf{w}^{*}\\|_2 \\leq R$, improving upon the $O(\\text{polylog}(N,d)/\\epsilon^2)$ runtime complexity of Awasthi et al. (NeurIPS 2022).", "revisions": [ { "version": "v1", "updated": "2024-09-05T16:59:56.000Z" } ], "analyses": { "keywords": [ "contamination model", "approximation bound", "iterative thresholding", "non-linear learning", "runtime complexity" ], "note": { "typesetting": "TeX", "pages": 35, "language": "en", "license": "arXiv", "status": "editable" } } }