Assessing the interestingness of temporal rules with Sequential Implication Intensity

In this article, we study the assessment of the interestingness of sequential rules (generally temporal rules). This is a crucial problem in sequence analysis since the frequent pattern mining algorithms are unsupervised and can produce huge amounts of ru

  • PDF / 2,405,871 Bytes
  • 17 Pages / 595.276 x 841.89 pts (A4) Page_size
  • 89 Downloads / 180 Views

DOWNLOAD

REPORT


Introduction Frequent pattern discovery in sequences of events1 (generally temporal sequences) is a major task in data mining. Research work in this domain consists of two approaches: • discovery of frequent episodes in a long sequence of events (approach initiated by Mannila, Toivonen, and Verkamo [12, 13]), • discovery of frequent sequential patterns in a set of sequences of events (approach initiated by Agrawal and Srikant [1, 17]). The similarity between episodes and sequential patterns is that they are sequential structures, i.e., a structure defined with an order (partial or total). Such a structure can be, for example: 1

Here we speak about sequences of qualitative variables. Such sequences are generally not called time series.

J. Blanchard et al.: Assessing the interestingness of temporal rules with Sequential Implication Intensity, Studies in Computational Intelligence (SCI) 127, 55–71 (2008) www.springerlink.com © Springer-Verlag Berlin Heidelberg 2008

56

Blanchard et al.

breakfast then lunch then dinner The structure is described by its frequency (or support) and generally by constraints on the event position, like a maximal time window “less than 12 hours stand between breakfast and dinner ” [5, 10, 14, 17, 18]. The difference between episodes and sequential patterns lies in the measure of their frequency: frequency of episodes is an intra-sequence notion [5, 10, 14, 18–20], while frequency of sequential patterns is an inter-sequence notion [1, 8, 16, 17, 21] (see [11] for a synthesis on the different ways of assessing frequency). Thus, the frequent episode mining algorithms search for structures which often recur inside a single sequence. On the other hand, the frequent sequential pattern mining algorithms search for structures which recur in numerous sequences (independently of the repetitions in each sequence). These last algorithms are actually an extension to sequential data of the frequent itemset mining algorithms, used among other things to generate association rules [2, 9]. Just as the discovery of frequent itemsets leads to the generation of association rules, the discovery of episodes/sequential patterns is often followed by a sequential rule generation stage which enables predictions to be made within the limits of a time window [5, 10, 14, 16–19, 21]. Such rules have been used to predict, for example, stock market prices [5] or events in a telecommunication network [14, 18]. A sequential rule can be for instance: 6h

breakfast −−−→ lunch This rule means “if one observe breakfast then one will certainly observe lunch less than 6 hours later”. In this article, we study the assessment of the interestingness of sequential rules. This is a crucial problem in sequence analysis since the frequent pattern mining algorithms are unsupervised and can produce a huge number of rules. While association rule interestingness has been widely studied in the literature (see [3] for a survey), there are few measures dedicated to sequential rules. In addition to frequency, one mainly finds an index of conf