Recency-based sequential pattern mining in multiple event sequences

  • PDF / 1,355,833 Bytes
  • 31 Pages / 439.37 x 666.142 pts Page_size
  • 90 Downloads / 205 Views

DOWNLOAD

REPORT


Recency-based sequential pattern mining in multiple event sequences Hakkyu Kim1 · Dong-Wan Choi1 Received: 1 November 2019 / Accepted: 9 September 2020 © The Author(s), under exclusive licence to Springer Science+Business Media LLC, part of Springer Nature 2020

Abstract The standard sequential pattern mining scheme hardly considers the positions of events in a sequence, and therefore it is difficult to focus on more interesting patterns that represent better the causal relationships between events. Without quantifying how close two events are in a sequence, we may fail to evaluate how likely an event is caused by the others from the pattern, which is a severe drawback for some applications like prediction. Motivated by this, we propose the recency-based sequential pattern mining scheme together with a novel measure of pattern interestingness to effectively capture recency as well as frequency. To efficiently extract all the recency-based sequential patterns, we devise a mining algorithm, called Recency-based Frequent pattern Miner (RF- Miner), together with an effective prediction method to evaluate the quality of recency-based patterns in terms of their prediction power. The experimental results show that our RF- Miner algorithm can extract more diverse and important patterns that can be used to make prediction of the next event, and can be more efficiently performed by using the upper bounds of our measure than baseline algorithms. Keywords Data mining · Sequential pattern mining · Web clickstream analysis

1 Introduction Sequences of events are highly prevalent in many applications, which are website navigation clicks, music play lists, e-commerce action logs, just to name a few. In order to analyze these event sequences, sequential pattern mining has been extensively

Responsible editor: M. J. Zaki

B

Dong-Wan Choi [email protected] Hakkyu Kim [email protected]

1

Inha University, Incheon, South Korea

123

H. Kim, D.-W. Choi

Fig. 1 An example of e-commerce website click-streams

studied as a key problem in the data mining community (Agrawal and Srikant 1995; Srikant and Agrawal 1996; Pei et al. 2001; Zaki 2001; Ayres et al. 2002; Pei et al. 2007). Given a sequence database, the basic process of sequential pattern mining is to extract all the subsequences frequently appearing in multiple sequences based on some kind of pattern frequency metric, often referred as support. Although frequent patterns identified by these algorithms are helpful to understand how likely a pattern happens in any given sequences, we can miss other interesting patterns particularly more useful for making prediction, considering the common nature of human behavior, called recency effect 1 (Colman 2009). Recency effect is known as the tendency of a person to remember more recent events better than the older ones, and consequently recent information receives greater weight in making a decision. This effect implies that, the closer the previous events are, the more important they are when a user decides which action to perform next. C