{ "id": "2308.03792", "version": "v1", "published": "2023-08-04T14:41:18.000Z", "updated": "2023-08-04T14:41:18.000Z", "title": "Multi-attacks: Many images $+$ the same adversarial attack $\\to$ many target labels", "authors": [ "Stanislav Fort" ], "comment": "Code at https://github.com/stanislavfort/multi-attacks", "categories": [ "cs.CV", "cs.CR", "cs.LG" ], "abstract": "We show that we can easily design a single adversarial perturbation $P$ that changes the class of $n$ images $X_1,X_2,\\dots,X_n$ from their original, unperturbed classes $c_1, c_2,\\dots,c_n$ to desired (not necessarily all the same) classes $c^*_1,c^*_2,\\dots,c^*_n$ for up to hundreds of images and target classes at once. We call these \\textit{multi-attacks}. Characterizing the maximum $n$ we can achieve under different conditions such as image resolution, we estimate the number of regions of high class confidence around a particular image in the space of pixels to be around $10^{\\mathcal{O}(100)}$, posing a significant problem for exhaustive defense strategies. We show several immediate consequences of this: adversarial attacks that change the resulting class based on their intensity, and scale-independent adversarial examples. To demonstrate the redundancy and richness of class decision boundaries in the pixel space, we look for its two-dimensional sections that trace images and spell words using particular classes. We also show that ensembling reduces susceptibility to multi-attacks, and that classifiers trained on random labels are more susceptible. Our code is available on GitHub.", "revisions": [ { "version": "v1", "updated": "2023-08-04T14:41:18.000Z" } ], "analyses": { "keywords": [ "adversarial attack", "target labels", "multi-attacks", "high class confidence", "single adversarial perturbation" ], "tags": [ "github project" ], "note": { "typesetting": "TeX", "pages": 0, "language": "en", "license": "arXiv", "status": "editable" } } }