DKDD_C: A Clustering-Based Approach for Distributed Knowledge Discovery
In this paper, we address the problem of knowledge discovery. Several approaches have been proposed in this field. However, existing approaches generate a huge number of association rules that are difficult to exploit and assimilate. Moreover, they have n
- PDF / 2,653,527 Bytes
- 11 Pages / 439.37 x 666.142 pts Page_size
- 26 Downloads / 277 Views
Abstract. In this paper, we address the problem of knowledge discovery. Several approaches have been proposed in this field. However, existing approaches generate a huge number of association rules that are difficult to exploit and assimilate. Moreover, they have not been proven themselves in a distributed context. As contribution, we propose, in this paper, DKDD_C, a new Distributed Knowledge Discovery approach. Exploiting, KDD based on data classification, we propose to give the choice to the user, either to generate Meta-Rules (rules between classes arising of preliminary data classification), or to generate classical Rules between distributed data. DKDD_C took place in both local and global processes. We prove that our solution minimizes the number of distributed generated association rules and then, offer a better interpretation of the data and optimization of the execution time. This approach has been validated by the implementation of a user-friendly platform as an extension of the Weka platform for the support of Distributed KDD. Keywords: Distributed knowledge discovery Mining association rules Distributed database Clustering Weka plateform extension
1 Introduction Nowadays, our ability to collect and store data from any type exceeds our possibilities of analysis, synthesis and Knowledge Discovery in Data (KDD). However, the performance of conventional centralized approaches degrade when the size of the processed data increases, in terms of execution time and memory space, hence we note the emergence towards the Distributed Knowledge Discovery (DKDD). Several approaches and tools have been proposed in this context. Through our study, we found that these theoretical and practical approaches have different limits: • Theoretically, DKDD algorithms generate a huge number of association rules that are difficult to exploit and assimilate. • Practically, existing tools (1) support only some KDD algorithm that generates a large number of association rules that are difficult to assimilate (2) tools have not © Springer International Publishing Switzerland 2016 Y. Tan et al. (Eds.): ICSI 2016, Part II, LNCS 9713, pp. 187–197, 2016. DOI: 10.1007/978-3-319-41009-8_20
188
M. Bouraoui et al.
been proven themselves in a distributed context. (3) Are applied only to one restricted type of data. We propose, in this paper, DKDD_C, a distributed knowledge discovery approach based on classification, which minimizes the number of distributed generated association rules and then offer a better interpretation of the data and optimized both the space memory and the execution time. By exploiting, KDD based on data classification, we propose to give the choice to the user, either to generate Meta-Rules (rules between classes arising of preliminary data classification), or to generate Rules between distributed data without preliminary classification. This approach has been validated by the implementation of a user-friendly plat-form as an extension of the Weka platform for the support of DKDD. This paper is organized as follows: S
Data Loading...