What is Interpretability?

  • PDF / 868,157 Bytes
  • 30 Pages / 439.642 x 666.49 pts Page_size
  • 69 Downloads / 223 Views

DOWNLOAD

REPORT


What is Interpretability? Adrian Erasmus1,2

· Tyler D. P. Brunet2

· Eyal Fisher3

Received: 6 July 2020 / Accepted: 19 October 2020 / © The Author(s) 2020

Abstract We argue that artificial networks are explainable and offer a novel theory of interpretability. Two sets of conceptual questions are prominent in theoretical engagements with artificial neural networks, especially in the context of medical artificial intelligence: (1) Are networks explainable, and if so, what does it mean to explain the output of a network? And (2) what does it mean for a network to be interpretable? We argue that accounts of “explanation” tailored specifically to neural networks have ineffectively reinvented the wheel. In response to (1), we show how four familiar accounts of explanation apply to neural networks as they would to any scientific phenomenon. We diagnose the confusion about explaining neural networks within the machine learning literature as an equivocation on “explainability,” “understandability” and “interpretability.” To remedy this, we distinguish between these notions, and answer (2) by offering a theory and typology of interpretation in machine learning. Interpretation is something one does to an explanation with the aim of producing another, more understandable, explanation. As with explanation, there are various concepts and methods involved in interpretation: Total or Partial, Global or Local, and Approximative or Isomorphic. Our account of “interpretability” is consistent with uses in the machine learning literature, in keeping with the philosophy of explanation and understanding, and pays special attention to medical artificial intelligence systems. Keywords Interpretability · Explainability · XAI · Medical AI

1 Introduction Two sets of conceptual problems have gained prominence in theoretical engagements with artificial neural networks (ANNs). The first is whether ANNs are explainable, and, if they are, what it means to explain their outputs. The second is what it means  Adrian Erasmus

[email protected]

Extended author information available on the last page of the article.

A. Erasmus et al.

for an ANN to be interpretable. In this paper, we argue that ANNs are, in one sense, already explainable and propose a novel theory of interpretability. These issues often arise in discussions of medical AI systems (MAIS), where reliance on artificial decision making in medical contexts could have serious consequences. There is evidence that some of these systems have superior diagnostic and predictive capabilities when compared to human experts (Esteva et al. 2017; Fleming 2018; Rajpurkar et al. 2017; Tschandl et al. 2019). Indeed, many MAIS are already deployed in the clinic, including algorithms aimed at diagnosing retinal disease (De Fauw et al. 2018), and breast cancer treatment recommendations (Somashekhar et al. 2018) and screening (McKinney et al. 2020). For some, accomplishments like these are a precursor to the promising incorporation of machine learning (ML) into effective medical decision making (Wiens and