arXiv:2104.11315 Abstract | arXiv Analytics

arXiv:2104.11315 [cs.LG]Abstract References Reviews Resources

SPECTRE: Defending Against Backdoor Attacks Using Robust Statistics

Jonathan Hayase, Weihao Kong, Raghav Somani, Sewoong Oh

Published 2021-04-22Version 1

Modern machine learning increasingly requires training on a large collection of data from multiple sources, not all of which can be trusted. A particularly concerning scenario is when a small fraction of poisoned data changes the behavior of the trained model when triggered by an attacker-specified watermark. Such a compromised model will be deployed unnoticed as the model is accurate otherwise. There have been promising attempts to use the intermediate representations of such a model to separate corrupted examples from clean ones. However, these defenses work only when a certain spectral signature of the poisoned examples is large enough for detection. There is a wide range of attacks that cannot be protected against by the existing defenses. We propose a novel defense algorithm using robust covariance estimation to amplify the spectral signature of corrupted data. This defense provides a clean model, completely removing the backdoor, even in regimes where previous methods have no hope of detecting the poisoned examples. Code and pre-trained models are available at https://github.com/SewoongLab/spectre-defense .

Comments: 29 pages 19 figures

Categories: cs.LG, cs.AI, stat.ML

Keywords: backdoor attacks, robust statistics, spectral signature, novel defense algorithm, poisoned examples

Related articles: Most relevant | Search more

arXiv:2310.11594 [cs.LG] (Published 2023-10-17)

Adversarial Robustness Unhardening via Backdoor Attacks in Federated Learning

Taejin Kim, Jiarui Li, Shubhranshu Singh, Nikhil Madaan, Carlee Joe-Wong

arXiv:2204.14017 [cs.LG] (Published 2022-04-29)

Backdoor Attacks in Federated Learning by Rare Embeddings and Gradient Ensembling

KiYoon Yoo, Nojun Kwak

arXiv:2003.08904 [cs.LG] (Published 2020-03-19)

RAB: Provable Robustness Against Backdoor Attacks