Opening the Black Box: Revealing Interpretable Sequence Motifs in Kernel-Based Learning Algorithms

This work is in the context of kernel-based learning algorithms for sequence data. We present a probabilistic approach to automatically extract, from the output of such string-kernel-based learning algorithms, the subsequences—or motifs—truly underlying t

PDF / 1,620,378 Bytes
17 Pages / 439.37 x 666.142 pts Page_size
55 Downloads / 218 Views

DOWNLOAD

REPORT

Berlin Institute of Technology, 10587 Berlin, Germany [email protected], {nico.goernitz,klaus-robert.mueller}@tu-berlin.de Department of Brain and Cognitive Engineering, Korea University, Anam-dong, Seongbuk-gu, Seoul 136-713, Republic of Korea 3 Memorial Sloan-Kettering Cancer Center, New York, NY 10065, USA [email protected] 4 Humboldt University of Berlin, 10099 Berlin, Germany [email protected]

Abstract. This work is in the context of kernel-based learning algorithms for sequence data. We present a probabilistic approach to automatically extract, from the output of such string-kernel-based learning algorithms, the subsequences—or motifs—truly underlying the machine’s predictions. The proposed framework views motifs as free parameters in a probabilistic model, which is solved through a global optimization approach. In contrast to prevalent approaches, the proposed method can discover even diﬃcult, long motifs, and could be combined with any kernel-based learning algorithm that is based on an adequate sequence kernel. We show that, by using a discriminate kernel machine such as a support vector machine, the approach can reveal discriminative motifs underlying the kernel predictor. We demonstrate the eﬃcacy of our approach through a series of experiments on synthetic and real data, including problems from handwritten digit recognition and a large-scale human splice site data set from the domain of computational biology.

1

Introduction

In the view of the rapidly increasing amount of data collected in science and technology, eﬀective automation of decisions is necessary. To this end, kernelbased methods [13,17,19,26,31,32] such as support vector machines (SVM) [5,7] have found diverse applications due to their distinct merits such as the descent computational complexity, high usability, and the solid mathematical foundation [24]. Kernel-based learning allows us to obtain more complex nonlinear learning machines from simple linear ones in a canonical way, since the learning and data representation processes are decoupled in a modular fashion. Yet, after more than a decade of research, kernel methods are widely considered as black boxes, and it remains an unsolved problem to make their decisions c Springer International Publishing Switzerland 2015 A. Appice et al. (Eds.): ECML PKDD 2015, Part II, LNAI 9285, pp. 137–153, 2015. DOI: 10.1007/978-3-319-23525-7 9

138

M.M.-C. Vidovic et al.

accessible or interpretable to domain experts. This is especially pressing in natural and life sciences, where not maximum prediction accuracy but unveiling the underlying natural principles is the foremost aim. In several important application ﬁelds, the data exhibits an inherent sequence structure. This includes DNA sequences in genomics, text data in natural language processing, and speech data in speech recognition. A state-of-the-art approach to learn from such sequence data consists in the weighted-degree (WD) kernel [4,27,28,31] in combination with a kernel-based learning machine such as an SVM. Given two d

Data Loading...

Opening the Black Box: Revealing Interpretable Sequence Motifs in Kernel-Based Learning Algorithms

Recommend Documents

Unsupervised Grammar Induction for Revealing the Internal Structure of Protein Sequence Motifs

Neither Black Nor Box: Ways of Knowing Algorithms

Opening the Black Box of Pharmaceutical Patent Value: An Empirical Analysis

Sequence Learning Paradigms, Algorithms, and Applications

PatchAttack: A Black-Box Texture-Based Attack with Reinforcement Learning

Black-Box Optimization in Railway Simulations

Corruption in the Gulf Region: A Black Box

Sequence Diversity in the Pore-Forming Motifs of the Membrane-Damaging Protein Toxins

Revisiting the Entrepreneurial Mind Inside the Black Box: An Expande

Automated Essay Scoring and the Deep Learning Black Box: How Are Rubric Scores Determined?

An Explanation Method for Black-Box Machine Learning Survival Models Using the Chebyshev Distance

Modelling human active search in optimizing black-box functions