arXiv:1906.03499 Abstract | arXiv Analytics

arXiv:1906.03499 [cs.LG]Abstract References Reviews Resources

ML-LOO: Detecting Adversarial Examples with Feature Attribution

Puyudi Yang, Jianbo Chen, Cho-Jui Hsieh, Jane-Ling Wang, Michael I. Jordan

Published 2019-06-08Version 1

Deep neural networks obtain state-of-the-art performance on a series of tasks. However, they are easily fooled by adding a small adversarial perturbation to input. The perturbation is often human imperceptible on image data. We observe a significant difference in feature attributions of adversarially crafted examples from those of original ones. Based on this observation, we introduce a new framework to detect adversarial examples through thresholding a scale estimate of feature attribution scores. Furthermore, we extend our method to include multi-layer feature attributions in order to tackle the attacks with mixed confidence levels. Through vast experiments, our method achieves superior performances in distinguishing adversarial examples from popular attack methods on a variety of real data sets among state-of-the-art detection methods. In particular, our method is able to detect adversarial examples of mixed confidence levels, and transfer between different attacking methods.

Categories: cs.LG, cs.CR, stat.ML

Keywords: detecting adversarial examples, detect adversarial examples, mixed confidence levels, method achieves superior performances, state-of-the-art detection methods

Related articles: Most relevant | Search more

arXiv:2410.17442 [cs.LG] (Published 2024-10-22)

Detecting Adversarial Examples

Furkan Mumcu, Yasin Yilmaz

arXiv:2206.08738 [cs.LG] (Published 2022-06-17)

Detecting Adversarial Examples in Batches -- a geometrical approach

Danush Kumar Venkatesh, Peter Steinbach

arXiv:2107.11630 [cs.LG] (Published 2021-07-24)

Detecting Adversarial Examples Is (Nearly) As Hard As Classifying Them

Florian Tramèr

arXiv Analytics

arXiv:1906.03499 [cs.LG]Abstract References Reviews Resources

ML-LOO: Detecting Adversarial Examples with Feature Attribution

Links

Toolbox

arXiv:1906.03499 [cs.LG]AbstractReferencesReviewsResources

ML-LOO: Detecting Adversarial Examples with Feature Attribution

Links

Toolbox

arXiv:1906.03499 [cs.LG]Abstract References Reviews Resources