Two-Phase Algorithms for a Novel Utility-Frequent Mining Model

When companies seek for the combination of products which can constantly generate high profit, the association rule mining (ARM) or the utility mining will not achieve such task. ARM mines frequent itemsets without knowing the producing profit. On the oth

  • PDF / 268,164 Bytes
  • 12 Pages / 430 x 660 pts Page_size
  • 33 Downloads / 180 Views

DOWNLOAD

REPORT


Department of Computer Science and Information Management, Providence University, Taichung 433, Taiwan [email protected] 2 Department of Computer Science and Information Engineering, Southern Taiwan University, Yung-Kang City, Tainan 71005, Taiwan [email protected] 3 Department of Information Engineering and Computer Science, Feng Chia University, Taichung 40724, Taiwan 4 Department of Computer Science and Information Engineering, National Chung Cheng University, Chiayi 62102, Taiwan [email protected]

Abstract. When companies seek for the combination of products which can constantly generate high profit, the association rule mining (ARM) or the utility mining will not achieve such task. ARM mines frequent itemsets without knowing the producing profit. On the other hand, the utility mining seeks high profit items but no guarantee the frequency. In this paper, we propose a novel utility-frequent mining model to identify all itemsets that can generate a user specified utility in transactions, in which the percentage of such transactions in database is not less than a minimum support threshold. A utility-frequent itemset indicates that such combination of products can constantly generate high profit. For finding all utility-frequent itemsets, there is no efficient strategy due to the nonexistence of “downward/upward closure property”. In order to tackle such challenge, we propose a bottom-up two-phase algorithm, BU-UFM, for efficiently mining utility-frequent itemsets. We also introduce a novel concept, quasi-utility-frequency, which is upward closed with respect to the lattice of all itemsets. In fact, each utility-frequent itemset is also quasi-utility-frequent. A top-down two-phase algorithm, TD-UFM, for mining utility-frequent itemsets is also presented in the paper.

1

Introduction

Data Mining has made a profound impact on business practices and knowledge management in recent years. Association Rule Mining (or market basket analysis), finding interesting association or correlation relationships among data items, is one of the most important data mining strategies. Since the concept of association rules was introduced by Agrawal et al. [2] in 1993, many algorithms T. Washio et al. (Eds.): PAKDD 2007 Workshops, LNAI 4819, pp. 433–444, 2007. c Springer-Verlag Berlin Heidelberg 2007 

434

J.-S. Yeh, Y.-C. Li, and C.-C. Chang

and techniques for mining association rules have been proposed in the literature. Traditional association rule mining (ARM) model treats all the items in the database equally by only considering if an item is present in a transaction or not. ARM focuses on deriving correlations among a set of items and their association rules. Although finding correlations of itemsets is very important, frequent itemsets identified by ARM may only contribute a small portion of the overall utility. In many situations, people may be more interested in finding out how a set of items support a specific objective that they want to achieve. Recently, a utility mining model was defined [17]. The goal of utility mining is to identif