BUZz: BUffer Zones for defending adversarial examples in image classification

3 Oct 2019  ·  Kaleel Mahmood, Phuong Ha Nguyen, Lam M. Nguyen, Thanh Nguyen, Marten van Dijk ·

We propose a novel defense against all existing gradient based adversarial attacks on deep neural networks for image classification problems. Our defense is based on a combination of deep neural networks and simple image transformations. While straightforward in implementation, this defense yields a unique security property which we term buffer zones. We argue that our defense based on buffer zones offers significant improvements over state-of-the-art defenses. We are able to achieve this improvement even when the adversary has access to the {\em entire} original training data set and unlimited query access to the defense. We verify our claim through experimentation using Fashion-MNIST and CIFAR-10: We demonstrate $<11\%$ attack success rate -- significantly lower than what other well-known state-of-the-art defenses offer -- at only a price of a $11-18\%$ drop in clean accuracy. By using a new intuitive metric, we explain why this trade-off offers a significant improvement over prior work.

PDF Abstract

Datasets


Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here