arXiv:2307.03305 [cs.LG]AbstractReferencesReviewsResources
A Vulnerability of Attribution Methods Using Pre-Softmax Scores
Published 2023-07-06Version 1
We discuss a vulnerability involving a category of attribution methods used to provide explanations for the outputs of convolutional neural networks working as classifiers. It is known that this type of networks are vulnerable to adversarial attacks, in which imperceptible perturbations of the input may alter the outputs of the model. In contrast, here we focus on effects that small modifications in the model may cause on the attribution method without altering the model outputs.
Comments: 7 pages, 5 figures,
Related articles: Most relevant | Search more
arXiv:2306.13197 [cs.LG] (Published 2023-06-22)
Pre or Post-Softmax Scores in Gradient-based Attribution Methods, What is Best?
arXiv:2103.13533 [cs.LG] (Published 2021-03-25)
Symmetry-Preserving Paths in Integrated Gradients
arXiv:2104.11695 [cs.LG] (Published 2021-04-23)
A Framework for Unsupervised Classificiation and Data Mining of Tweets about Cyber Vulnerabilities