CHIRPS: Explaining random forest classification

  • PDF / 8,833,748 Bytes
  • 42 Pages / 439.37 x 666.142 pts Page_size
  • 59 Downloads / 327 Views

DOWNLOAD

REPORT


CHIRPS: Explaining random forest classification Julian Hatwell1 · Mohamed Medhat Gaber1 · R. Muhammad Atif Azad1

© The Author(s) 2020

Abstract Modern machine learning methods typically produce “black box” models that are opaque to interpretation. Yet, their demand has been increasing in the Human-in-the-Loop processes, that is, those processes that require a human agent to verify, approve or reason about the automated decisions before they can be applied. To facilitate this interpretation, we propose Collection of High Importance Random Path Snippets (CHIRPS); a novel algorithm for explaining random forest classification per data instance. CHIRPS extracts a decision path from each tree in the forest that contributes to the majority classification, and then uses frequent pattern mining to identify the most commonly occurring split conditions. Then a simple, conjunctive form rule is constructed where the antecedent terms are derived from the attributes that had the most influence on the classification. This rule is returned alongside estimates of the rule’s precision and coverage on the training data along with counter-factual details. An experimental study involving nine data sets shows that classification rules returned by CHIRPS have a precision at least as high as the state of the art when evaluated on unseen data (0.91–0.99) and offer a much greater coverage (0.04– 0.54). Furthermore, CHIRPS uniquely controls against under- and over-fitting solutions by maximising novel objective functions that are better suited to the local (per instance) explanation setting. Keywords  XAI · Model interpretability · Random forests · Classification · Frequent patterns

1 Introduction Explainable Artificial Intelligence (XAI) is no longer just a research question (DoshiVelez and Kim 2017); it is a concern of national defence and industrial strategy (Gunning 2017; Goodman and Flaxman 2016) and a topic of regular public discourse (Tierney 2017; O’Neil and Hayworth 2018). The challenge—to make AI explainable—arises because of a cognitive-representational mismatch; modern machine learning (ML) methods and models operate on dimensions, complexity and modes of knowledge representation that make them * Julian Hatwell [email protected] http://www.bcu.ac.uk 1



Birmingham City University, Birmingham B5 5JU, UK

13

Vol.:(0123456789)



J. Hatwell et al.

opaque to human understanding. Such models are termed “black boxes” (Freitas 2014; Lipton 2016). Some have argued that classification performance improves only negligibly when we use complex, black box models instead of the classical methods such as linear discriminant analysis (Rudin 2018; Hand 2006); however, the majority still prefers to use the black box models such as Random Forests, Gradient Boosting Machines, Support Vector Machines and Neural Networks as the first choice methods for many applications. This preference may in part be thanks to the intense performance competitions—such as those hosted by Kaggle—as well as demands of commercial and critical applications. R