arXiv:2012.00732 Abstract | arXiv Analytics

arXiv:2012.00732 [cs.LG]Abstract References Reviews Resources

Convergence and Sample Complexity of SGD in GANs

Vasilis Kontonis, Sihan Liu, Christos Tzamos

Published 2020-12-01Version 1

We provide theoretical convergence guarantees on training Generative Adversarial Networks (GANs) via SGD. We consider learning a target distribution modeled by a 1-layer Generator network with a non-linear activation function $\phi(\cdot)$ parametrized by a $d \times d$ weight matrix $\mathbf W_*$, i.e., $f_*(\mathbf x) = \phi(\mathbf W_* \mathbf x)$. Our main result is that by training the Generator together with a Discriminator according to the Stochastic Gradient Descent-Ascent iteration proposed by Goodfellow et al. yields a Generator distribution that approaches the target distribution of $f_*$. Specifically, we can learn the target distribution within total-variation distance $\epsilon$ using $\tilde O(d^2/\epsilon^2)$ samples which is (near-)information theoretically optimal. Our results apply to a broad class of non-linear activation functions $\phi$, including ReLUs and is enabled by a connection with truncated statistics and an appropriate design of the Discriminator network. Our approach relies on a bilevel optimization framework to show that vanilla SGDA works.

Categories: cs.LG, math.ST, stat.TH

Keywords: sample complexity, non-linear activation function, target distribution, convergence, stochastic gradient descent-ascent iteration

Related articles: Most relevant | Search more

arXiv:2109.03194 [cs.LG] (Published 2021-09-07)

On the Convergence of Decentralized Adaptive Gradient Methods

Xiangyi Chen, Belhal Karimi, Weijie Zhao, Ping Li

arXiv:1810.00122 [cs.LG] (Published 2018-09-29)

On the Convergence and Robustness of Batch Normalization

Yongqiang Cai, Qianxiao Li, Zuowei Shen

arXiv:1811.09358 [cs.LG] (Published 2018-11-23)

A Sufficient Condition for Convergences of Adam and RMSProp

Fangyu Zou, Li Shen, Zequn Jie, Weizhong Zhang, Wei Liu