AI 2003: Advances in Artificial Intelligence 16th Australian Con

Consider the problem of a robot (algorithm, learning mechanism) moving along the real line attempting to locate a particular point ? . To assist the me- anism, we assume that it can communicate with an Environment (“Oracle”) which guides it with informati

  • PDF / 17,259,557 Bytes
  • 1,095 Pages / 430 x 660 pts Page_size
  • 40 Downloads / 236 Views

DOWNLOAD

REPORT


Abstract. Emerging patterns are sets of items whose frequency changes significantly from one dataset to another. They are useful as a means of discovering distinctions inherently present amongst a collection datasets and have been shown to be a powerful method for constructing accurate classifiers. In this paper, we present different varieties of emerging patterns, discuss efficient techniques for their discovery and explain how they can be used in classification.

1

Introduction

Discovery of powerful distinguishing features between datasets is an important objective in data mining. An important class of patterns that can represent strong contrasts is known as emerging patterns. An emerging pattern is a set of items whose frequency changes significantly from one dataset to another. Emerging patterns have been successfully used in constructing highly accurate classifiers [19, 10, 20]. In particular, for predicting the likelihood of diseases such as leukaemia [21] and discovering patterns in gene expression data [22]. As an example, consider the following database defined on six attributes: (with all possible attribute values given in parentheses) Attr1 (a, b, c), Attr2 (d, e, f ), Attr3 (g, h, i), Attr4 (j, k, l), Attr5 (m, n, o) and Attr6 (p, q, r). The minimal emerging patterns in this dataset are sets of items which appear in one class and not in the other (by minimal, we mean that no proper subset of the items in the pattern should also be an emerging pattern). Inspecting the table, we see that patterns appearing in Class 1 include {a, i}, {o, p}, and {b, g}. Patterns appearing in Class 2 include {f } and {k}. These patterns represent very strong changes (zero frequency in one dataset and non-zero frequency in

Table 1. Sample Database Class 1 Instances Class 2 Instances {a, d, i, l, o, p} {c, d, i, l, o, r} {b, d, g, l, o, r} {a, f, g, k, o, q} {c, d, h, l, o, r} {b, d, h, j, m, p} T.D. Gedeon and L.C.C. Fung (Eds.): AI 2003, LNAI 2903, pp. 1–12, 2003. c Springer-Verlag Berlin Heidelberg 2003 

2

Kotagiri Ramamohanarao and James Bailey

the other) and they are given a special name to indicate this property - jumping emerging patterns. Other useful, but less restrictive emerging patterns can also be defined. e.g. Constrained emerging patterns are minimal sets of items which occur ≥ α times in one class and ≤ β times in the other. The set of items {d, o} is such a pattern appearing in Class 1, for α = 3 and β = 1. In this paper, we discuss the principal types of emerging patterns, and focus on two principal issues – Methods for Efficient Pattern Discovery: This is a challenge, especially for high volume, high dimensionality datasets, since in the worst case, the number of patterns present in the data can be exponential. – Classification Methods: Constructing emerging pattern classifiers requires a number of different decisions to be made. What type of patterns should be mined, which patterns should be used in classification and the how are patterns weighted in the scoring function. An outline of the rest of the paper is as fo