A Closed Frequent Subgraph Mining Algorithm in Unique Edge Label Graphs

Problems such as closed frequent subset mining, itemset mining, and connected tree mining can be solved in a polynomial delay. However, the problem of mining closed frequent connected subgraphs is a problem that requires an exponential time. In this paper

  • PDF / 368,132 Bytes
  • 15 Pages / 439.37 x 666.142 pts Page_size
  • 89 Downloads / 164 Views

DOWNLOAD

REPORT


3

Preparatory School of Science and Technology, P.O. Box 218, 23000 Annaba, Algeria 2 LabGED Laboratory, Badji Mokhtar University, P.O. Box 12, 23000 Annaba, Algeria {karabadji,seridi}@labged.net Aalto University, School of Science, P.O. Box 12200, 00076 Espoo, FI, Finland [email protected]

Abstract. Problems such as closed frequent subset mining, itemset mining, and connected tree mining can be solved in a polynomial delay. However, the problem of mining closed frequent connected subgraphs is a problem that requires an exponential time. In this paper, we present ECE-CloseSG, an algorithm for finding closed frequent unique edge label subgraphs. ECE-CloseSG uses a search space pruning and applies the strong accessibility property that allows to ignore not interesting subgraphs. In this work, graph and subgraph isomorphism problems are reduced to set inclusion and set equivalence relations. Keywords: Unique edge labels · Closed frequent subgraph Graph/subgraph isomorphism · Set inclusion/equivalence

1

·

Introduction

The problem of interesting pattern mining is a main task in pattern mining and has several application domains such as social network analysis [6], bioinformatics [1,13] and Web mining [14]. It consists in finding patterns that satisfy a set of constraints. The frequency is one of the most used constraints [4,5,7,10]. A frequent pattern in a given collection/database D is a pattern that occurs at least in δ structures of the database where δ is a given support threshold. In general, the size of the set of frequent patterns is too large. This is due to the downward closure property (all generalizations or sub-patterns of a frequent pattern must be frequent). Thus, the enumeration and the analysis of such a big set of frequent patterns is a challenging problem. To overcome this issue, many research works have focused on special types of frequent subgraphs that allow to restore the set of all frequent subgraphs. These particular patterns are closed and maximal frequent subgraphs. In this context, many efficient algorithms for mining closed and maximal patterns have been proposed such as CloseGraph [18], SPIN [9], Margin [16] and ISG [15]. While mining closed and maximal patterns like itemsets, c Springer International Publishing Switzerland 2016  P. Perner (Ed.): MLDM 2016, LNAI 9729, pp. 43–57, 2016. DOI: 10.1007/978-3-319-41920-6 4

44

N. El Islem Karabadji et al.

keys and trees needs a polynomial delay [17], [11], and [12], the situation gets more complicated when complex patterns such as graphs are considered. The task of mining graph patterns is called frequent subgraph mining (FSM) and includes two main phases: (1) generation of candidates and (2) frequency test. Frequency test has a variable cost according to pattern complexity. For example, itemsets use set inclusion relation to test frequency, whereas trees use subtree isomorphism. In contrast to these cases that need only a polynomial delay, an exponential delay is required for graph databases (using subgraph isomorphism, which is an NP-