arXiv:2107.11671 Abstract | arXiv Analytics

arXiv:2107.11671 [cs.LG]Abstract References Reviews Resources

Adversarial training may be a double-edged sword

Ali Rahmati, Seyed-Mohsen Moosavi-Dezfooli, Huaiyu Dai

Published 2021-07-24Version 1

Adversarial training has been shown as an effective approach to improve the robustness of image classifiers against white-box attacks. However, its effectiveness against black-box attacks is more nuanced. In this work, we demonstrate that some geometric consequences of adversarial training on the decision boundary of deep networks give an edge to certain types of black-box attacks. In particular, we define a metric called robustness gain to show that while adversarial training is an effective method to dramatically improve the robustness in white-box scenarios, it may not provide such a good robustness gain against the more realistic decision-based black-box attacks. Moreover, we show that even the minimal perturbation white-box attacks can converge faster against adversarially-trained neural networks compared to the regular ones.

Comments: Presented as a RobustML workshop paper at ICLR 2021

Categories: cs.LG, cs.CR, cs.CV

Keywords: adversarial training, double-edged sword, minimal perturbation white-box attacks, robustness gain, realistic decision-based black-box attacks

Related articles: Most relevant | Search more

arXiv:2102.13624 [cs.LG] (Published 2021-02-26)

What Doesn't Kill You Makes You Robust(er): Adversarial Training against Poisons and Backdoors

Jonas Geiping, Liam Fowl, Gowthami Somepalli, Micah Goldblum, Michael Moeller, Tom Goldstein

arXiv:2106.01606 [cs.LG] (Published 2021-06-03)

Exploring Memorization in Adversarial Training

Yinpeng Dong, Ke Xu, Xiao Yang, Tianyu Pang, Zhijie Deng, Hang Su, Jun Zhu

arXiv:1811.09716 [cs.LG] (Published 2018-11-23)

Robustness via curvature regularization, and vice versa