arXiv:2402.04375 Abstract | arXiv Analytics

arXiv:2402.04375 [cs.LG]Abstract References Reviews Resources

Bounding the Excess Risk for Linear Models Trained on Marginal-Preserving, Differentially-Private, Synthetic Data

Yvonne Zhou, Mingyu Liang, Ivan Brugere, Dana Dachman-Soled, Danial Dervovic, Antigoni Polychroniadou, Min Wu

Published 2024-02-06Version 1

The growing use of machine learning (ML) has raised concerns that an ML model may reveal private information about an individual who has contributed to the training dataset. To prevent leakage of sensitive data, we consider using differentially-private (DP), synthetic training data instead of real training data to train an ML model. A key desirable property of synthetic data is its ability to preserve the low-order marginals of the original distribution. Our main contribution comprises novel upper and lower bounds on the excess empirical risk of linear models trained on such synthetic data, for continuous and Lipschitz loss functions. We perform extensive experimentation alongside our theoretical results.

Categories: cs.LG, cs.CR

Keywords: synthetic data, linear models, excess risk, differentially-private, main contribution comprises novel upper

Related articles: Most relevant | Search more

arXiv:2210.16405 [cs.LG] (Published 2022-10-28)

Evaluation of Categorical Generative Models -- Bridging the Gap Between Real and Synthetic Data

Florence Regol, Anja Kroon, Mark Coates

arXiv:2004.14046 [cs.LG] (Published 2020-04-29)

Reducing catastrophic forgetting with learning on synthetic data

Wojciech Masarczyk, Ivona Tautkute

arXiv:2307.03364 [cs.LG] (Published 2023-07-07)

Distilled Pruning: Using Synthetic Data to Win the Lottery

Luke McDermott, Daniel Cummings