PGD-2 can be better than FGSM + GradAlign

29 Sep 2021  ·  Tianhang Zheng, Baochun Li ·

One major issue of adversarial training (AT) with the fast gradient sign method (FGSM AT) is the phenomenon of catastrophic overfitting, meaning that the trained model suddenly loses its robustness over a single epoch. In addition to FGSM AT, Andriushchenko & Flammarion (2020) observed that two-step projected gradient descent adversarial training (PGD-2 AT) also suffers from catastrophic overfitting for large $\ell_\infty$ perturbations. To prevent catastrophic overfitting, Andriushchenko & Flammarion (2020) proposed a gradient alignment regularization method (GradAlign) and claimed that GradAlign can prevent catastrophic overfitting in FGSM AT and PGD-2 AT. In this paper, we show that PGD-2 AT with random initialization (PGD-2-RS AT) and attack step size $\alpha=1.25\epsilon/2$ only needs approximately a half computational cost of FGSM + GradAlign AT and actually can avoid catastrophic overfitting for large $\ell_\infty$ perturbations. We hypothesize that, if FGSM-RS AT with $\alpha=1.25\epsilon/2$ can avoid catastrophic overfitting for $\ell_\infty$ perturbation size $\epsilon/2$, then PGD-2-RS AT with $\alpha=1.25\epsilon/2$ may be able to avoid catastrophic overfitting for $\ell_\infty$ perturbation size $\epsilon$. Our intuitions to justify this empirical hypothesis induce a more unexpected finding: If we apply random noise from the uniform distribution $\mathcal{U}(-\epsilon/2, \epsilon/2)$ to the perturbations before each step of PGD-2 with $\alpha=1.25\epsilon/2$, instead of initializing the perturbations with random noise from $\mathcal{U}(-\epsilon, \epsilon)$ at the beginning ({\em i.e.,} the conventional random initialization scheme), the corresponding AT method can also avoid catastrophic overfitting and even achieve better robust accuracy in most cases. We refer to this AT method as Qusai-PGD-2-RS AT. Extensive evaluations demonstrate that PGD-2-RS AT and Qusai-PGD-2-RS AT with $\alpha=1.25\epsilon/2$ achieve better performance and efficiency than FGSM + GradAlign AT. Notably, Qusai-PGD-2-RS AT achieves comparable robust accuracy against PGD-50-10 as PGD-3-RS AT on CIFAR10 and SVHN, and it also achieves approximately $18\%$ top-1 and $38\%$ top-5 robust accuracy against PGD-50-10 at $\epsilon=8/255$ on ImageNet.

PDF Abstract
No code implementations yet. Submit your code now

Tasks


Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here