Journal article 128 views
Jacobian norm with Selective Input Gradient Regularization for interpretable adversarial defense
Pattern Recognition, Volume: 145, Start page: 109902
Proof under embargo until: 22nd August 2024
Deep neural networks (DNNs) can be easily deceived by imperceptible alterations known as adversarial examples. These examples can lead to misclassification, posing a significant threat to the reliability of deep learning systems in real-world applications. Adversarial training (AT) is a popular tech...
|Published in:||Pattern Recognition|
Check full text
No Tags, Be the first to tag this record!
Deep neural networks (DNNs) can be easily deceived by imperceptible alterations known as adversarial examples. These examples can lead to misclassification, posing a significant threat to the reliability of deep learning systems in real-world applications. Adversarial training (AT) is a popular technique used to enhance robustness by training models on a combination of corrupted and clean data. However, existing AT-based methods often struggle to handle transferred adversarial examples that can fool multiple defense models, thereby falling short of meeting the generalization requirements for real-world scenarios. Furthermore, AT typically fails to provide interpretable predictions, which are crucial for domain experts seeking to understand the behavior of DNNs. To overcome these challenges, we present a novel approach called Jacobian norm and Selective Input Gradient Regularization (J-SIGR). Our method leverages Jacobian normalization to improve robustness and introduces regularization of perturbation-based saliency maps, enabling interpretable predictions. By adopting J-SIGR, we achieve enhanced defense capabilities and promote high interpretability of DNNs. We evaluate the effectiveness of J-SIGR across various architectures by subjecting it to powerful adversarial attacks. Our experimental evaluations provide compelling evidence of the efficacy of J-SIGR against transferred adversarial attacks, while preserving interpretability. The project code can be found at https://github.com/Lywu-github/jJ-SIGR.git.
Selective input gradient regularization, Jacobian normalization, Adversarial robustness
Faculty of Science and Engineering
This work was partially supported by NSFC U19A2073 , 62002096, 62001394, 62372150, 62176086.