Learning credible DNNs via incorporating prior knowledge and model local explanation
- PDF / 1,498,630 Bytes
- 28 Pages / 439.37 x 666.142 pts Page_size
- 34 Downloads / 148 Views
Learning credible DNNs via incorporating prior knowledge and model local explanation Mengnan Du1
· Ninghao Liu1 · Fan Yang1 · Xia Hu1
Received: 8 January 2020 / Revised: 25 September 2020 / Accepted: 4 October 2020 © Springer-Verlag London Ltd., part of Springer Nature 2020
Abstract Recent studies have shown that state-of-the-art DNNs are not always credible, despite their impressive performance on the hold-out test set of a variety of tasks. These models tend to exploit dataset shortcuts to make predictions, rather than learn the underlying task. The noncredibility could lead to low generalization, adversarial vulnerability, as well as algorithmic discrimination of the DNN models. In this paper, we propose CREX in order to develop more credible DNNs. The high-level idea of CREX is to encourage DNN models to focus more on evidences that actually matter for the task at hand and to avoid overfitting to data-dependent shortcuts. Specifically, in the DNN training process, CREX directly regularizes the local explanation with expert rationales, i.e., a subset of features highlighted by domain experts as justifications for predictions, to enforce the alignment between local explanations and rationales. Even when rationales are not available, CREX still could be useful by requiring the generated explanations to be sparse. In addition, CREX is widely applicable to different network architectures, including CNN, LSTM and attention model. Experimental results on several text classification datasets demonstrate that CREX could increase the credibility of DNNs. Comprehensive analysis further shows three meaningful improvements of CREX: (1) it significantly increases DNN accuracy on new and previously unseen data beyond test set, (2) it enhances fairness of DNNs in terms of equality of opportunity metric and reduce models’ discrimination toward certain demographic group, and (3) it promotes the robustness of DNN models with respect to adversarial attack. These experimental results highlight the advantages of the increased credibility by CREX. Keywords Deep neural network · Credibility · Prior knowledge · Generalization · Fairness · Adversarial
B
Mengnan Du [email protected] Ninghao Liu [email protected] Fan Yang [email protected] Xia Hu [email protected]
1
Department of Computer Science and Engineering, Texas A&M University, College Station, USA
123
M. Du et al.
1 Introduction Deep neural networks (DNNs) have achieved super-human performance in many applications, including complex vision tasks such as object recognition and semantic segmentation [6,16], or high-level language understanding tasks like reading comprehension, question answering and natural language understanding [8,55]. Nevertheless, recent studies show that these DNNs might not be credible [13]. Some of their “success” could be attributed to adopting superficial patterns (or shortcuts) in the data, rather than capturing the underlying generalization. The non-credibility issue has been observed in various DNN systems. The most representative example is perh
Data Loading...