arXiv Analytics

Sign in

arXiv:1802.09583 [cs.LG]AbstractReferencesReviewsResources

Data-dependent PAC-Bayes priors via differential privacy

Gintare Karolina Dziugaite, Daniel M. Roy

Published 2018-02-26Version 1

The Probably Approximately Correct (PAC) Bayes framework (McAllester, 1999) can incorporate knowledge about the learning algorithm and data distribution through the use of distribution-dependent priors, yielding tighter generalization bounds on data-dependent posteriors. Using this flexibility, however, is difficult, especially when the data distribution is presumed to be unknown. We show how an {\epsilon}-differentially private data-dependent prior yields a valid PAC-Bayes bound, and then show how non-private mechanisms for choosing priors obtain the same generalization bound provided they converge weakly to the private mechanism. As an application of this result, we show that a Gaussian prior mean chosen via stochastic gradient Langevin dynamics (SGLD; Welling and Teh, 2011) leads to a valid PAC-Bayes bound, despite SGLD only converging weakly to an {\epsilon}-differentially private mechanism. As the bounds are data-dependent, we study the bounds empirically on synthetic data and standard neural network benchmarks in order to illustrate the gains of data-dependent priors over existing distribution-dependent PAC-Bayes bound.

Comments: 17 pages, 2 figures; subsumes and extends some results first reported in arXiv:1712.09376
Categories: cs.LG, stat.ML
Related articles: Most relevant | Search more
arXiv:1901.09136 [cs.LG] (Published 2019-01-26)
Graphical-model based estimation and inference for differential privacy
arXiv:2405.06627 [cs.LG] (Published 2024-05-10)
Conformal Validity Guarantees Exist for Any Data Distribution
arXiv:2203.11556 [cs.LG] (Published 2022-03-22)
VQ-Flows: Vector Quantized Local Normalizing Flows