arXiv:1906.07983 Abstract | arXiv Analytics

arXiv:1906.07983 [stat.ML]Abstract References Reviews Resources

Explanations can be manipulated and geometry is to blame

Ann-Kathrin Dombrowski, Maximilian Alber, Christopher J. Anders, Marcel Ackermann, Klaus-Robert Müller, Pan Kessel

Published 2019-06-19Version 1

Explanation methods aim to make neural networks more trustworthy and interpretable. In this paper, we demonstrate a property of explanation methods which is disconcerting for both of these purposes. Namely, we show that explanations can be manipulated arbitrarily by applying visually hardly perceptible perturbations to the input that keep the network's output approximately constant. We establish theoretically that this phenomenon can be related to certain geometrical properties of neural networks. This allows us to derive an upper bound on the susceptibility of explanations to manipulations. Based on this result, we propose effective mechanisms to enhance the robustness of explanations.

Categories: stat.ML, cs.CR, cs.LG

Keywords: neural networks, explanation methods aim, networks output approximately constant, upper bound, applying visually hardly perceptible perturbations

Related articles: Most relevant | Search more

arXiv:1907.00825 [stat.ML] (Published 2019-07-01)

Time-to-Event Prediction with Neural Networks and Cox Regression

Håvard Kvamme, Ørnulf Borgan, Ida Scheel

arXiv:2403.12187 [stat.ML] (Published 2024-03-18)

Approximation of RKHS Functionals by Neural Networks

Tian-Yi Zhou, Namjoon Suh, Guang Cheng, Xiaoming Huo

arXiv:2312.08083 [stat.ML] (Published 2023-12-13)

Training of Neural Networks with Uncertain Data, A Mixture of Experts Approach

Lucas Luttner

arXiv Analytics

arXiv:1906.07983 [stat.ML]Abstract References Reviews Resources

Explanations can be manipulated and geometry is to blame

Links

Toolbox

arXiv:1906.07983 [stat.ML]AbstractReferencesReviewsResources

Explanations can be manipulated and geometry is to blame

Links

Toolbox

arXiv:1906.07983 [stat.ML]Abstract References Reviews Resources