Anytime mining of sequential discriminative patterns in labeled sequences

  • PDF / 3,843,075 Bytes
  • 38 Pages / 439.37 x 666.142 pts Page_size
  • 88 Downloads / 211 Views

DOWNLOAD

REPORT


Anytime mining of sequential discriminative patterns in labeled sequences Romain Mathonat1,2 Mehdi Kaytoue1,3

· Diana Nurbakova1 · Jean-François Boulicaut1 ·

Received: 5 November 2019 / Revised: 15 October 2020 / Accepted: 18 October 2020 © Springer-Verlag London Ltd., part of Springer Nature 2020

Abstract It is extremely useful to exploit labeled datasets not only to learn models and perform predictive analytics but also to improve our understanding of a domain and its available targeted classes. The subgroup discovery task has been considered for more than two decades. It concerns the discovery of patterns covering sets of objects having interesting properties, e.g., they characterize or discriminate a given target class. Though many subgroup discovery algorithms have been proposed for both transactional and numerical data, discovering subgroups within labeled sequential data has been much less studied. First, we propose an anytime algorithm SeqScout that discovers interesting subgroups w.r.t. a chosen quality measure. This is a sampling algorithm that mines discriminant sequential patterns using a multi-armed bandit model. For a given budget, it finds a collection of local optima in the search space of descriptions and thus, subgroups. It requires a light configuration and is independent from the quality measure used for pattern scoring. We also introduce a second anytime algorithm MCTSExtent that pushes further the idea of a better trade-off between exploration and exploitation of a sampling strategy over the search space. To the best of our knowledge, this is the first time that the Monte Carlo Tree Search framework is exploited in a sequential data mining setting. We have conducted a thorough and comprehensive evaluation of our algorithms on several datasets to illustrate their added value, and we discuss their qualitative and quantitative results.

B

Romain Mathonat [email protected] Diana Nurbakova [email protected] Jean-François Boulicaut [email protected] Mehdi Kaytoue [email protected]

1

Université de Lyon, CNRS, INSA Lyon, LIRIS, UMR5205, 69621 Lyon, France

2

Atos, 69100 Villeurbanne, France

3

Infologic R&D, 69007 Lyon, France

123

R. Mathonat et al.

Keywords Pattern mining · Subgroup discovery · Upper confidence bound · Monte Carlo tree search · Multi-armed bandit

1 Introduction In many data science projects, we have to process labeled data and it is often valuable to discover descriptions, say patterns, that discriminate well some classes. This can be used to support various machine learning techniques aiming at predicting the class label for unseen objects (i.e., learning models). It is also interesting per se since the language for descriptions is, by design, readable and interpretable by analysts and data owners. Therefore, it can be used to explore and better understand a given phenomenon thanks to collected labeled data, and it can also provide a solid foundation for building new relevant features [43]. The search for such patterns has been re