Granular computing for relational data classification

  • PDF / 426,059 Bytes
  • 24 Pages / 439.37 x 666.142 pts Page_size
  • 25 Downloads / 203 Views

DOWNLOAD

REPORT


Granular computing for relational data classification ´ Piotr Honko

Received: 29 November 2012 / Revised: 15 February 2013 / Accepted: 18 February 2013 © The Author(s) 2013. This article is published with open access at Springerlink.com

Abstract We propose a novel framework for generating classification rules from relational data. This is a specialized version of the general framework intended for mining relational data and is defined in granular computing theory. In the framework proposed in this paper we define a method for deriving information granules from relational data. Such granules are the basis for generating relational classification rules. In our approach we follow the granular computing idea of switching between different levels of granularity of the universe. Thanks to this a granule-based relational data representation can easily be replaced by another one and thereby adjusted to a given data mining task, e.g. classification. A generalized relational data representation, as defined in the framework, can be treated as the search space for generating rules. On account of this the size of the search space may significantly be limited. Furthermore, our framework, unlike others, unifies not only the way the data and rules to be derived are expressed and specified, but also partially the process of generating rules from the data. Namely, the rules can be directly obtained from the information granules or constructed based on them. Keywords Multi-relational data mining · Database models · Granular computing · Classification

1 Introduction The task of classification has extensively been studied in the field of data mining (see, e.g., Han et al. 2011; Tan et al. 2005; Banks et al. 2004). This issue has also been widely investigated for relational data (see, e.g., Džeroski and Lavraˇc 2001b; Zhen et al. 2009; Thangaraj and Vijayalakshmi 2011). One can indicate many different

´ P. Honko (B) Department of Computer Science, Bialystok University of Technology, Wiejska 45A, 15-351 Bialystok, Poland e-mail: [email protected]

J Intell Inf Syst

techniques and algorithms for classifying relational data; however, a unified framework for this task does not seem to have been introduced so far. Such a framework is needed for unifying operations that are independent of the technique or algorithm applied for classifying relational data. One can indicate the following essential operations that need to be unified: relational object representation, search space limitation and generation of relational patterns. These issues will briefly be discussed. 1. An object of a single-table database is represented by a tuple of table attribute values. An object of a database with a relational structure can be represented not only by a tuple that belongs to a table to be analyzed, but also by a certain part of the tuples of other tables that are directly or indirectly joined to the table under consideration. Therefore, relational object representation can vary depending on a given data mining task. 2. The search space for discoverin