Uncertain maximal frequent subgraph mining algorithm based on adjacency matrix and weight

  • PDF / 2,077,167 Bytes
  • 11 Pages / 595.276 x 790.866 pts Page_size
  • 40 Downloads / 190 Views

DOWNLOAD

REPORT


ORIGINAL ARTICLE

Uncertain maximal frequent subgraph mining algorithm based on adjacency matrix and weight Di Wu1 · Jiadong Ren2 · Long Sheng1 

Received: 17 May 2016 / Accepted: 20 February 2017 © Springer-Verlag Berlin Heidelberg 2017

Abstract  How to mine many interesting subgraphs in uncertain graph has become an important research field in data mining. In this paper, a novel algorithm Uncertain Maximal Frequent Subgraph Mining Algorithm Based on Adjacency Matrix and Weight (UMFGAMW) is proposed. The definition of the adjacency matrix and the standard matrix coding for uncertain graph are presented. The correspondence between the adjacency matrix and uncertain graph is established. A new vertex ordering policy for computing the standard coding of uncertain graph adjacency matrix is designed. The complexity of uncertain graph standard coding is reduced, and the matching speed of uncertain subgraph standard coding is improved. The definition of the weight of uncertain graph and the mean weight of uncertain edge is proposed. The importance of the uncertain subgraphs that meet the minimum support threshold in the graph dataset is fully considered. Finally, a depth-first search weighted uncertain maximal frequent subgraph mining algorithm is discussed. According to the limiting condition of the uncertain maximum frequent subgraph and weighed uncertain edge, the number of mining results is reduced effectively. Experimental results demonstrate that the UMFGAMW algorithm has higher efficiency and better scalability. Keywords  Adjacency matrix · Weight · Uncertain graph · Maximum frequent subgraph · Frequent subgraph mining

* Di Wu [email protected] 1

Hebei University of Engineering, Handan, Hebei, China

2

Yanshan University, Qinhuangdao, Hebei, China



1 Introduction Recently, more areas of science attempt to describe complex structure objects by graph structure [1]. In the field of biology and medicine, scientists often use the graph structure to express the internal structure of macromolecules. In the internet field, the graph structure is used to describe the link relationships and views between websites. Since graph data have been more widely applied, graph data mining has become a hot topic in data mining research [2]. In practical applications, graph data have the following characteristics [3–5]: First, there is a large quantity amount of data. Second, the data have a fast data growth rate. The update of graph data is highly frequent. Because of these features and immature technical conditions, the accuracy of obtained graph data is poor, and uncertainty exists. For example, the field of molecular biology utilizes graph data to describe protein and its interaction network. Protein is often denoted by vertices, and the relationship of proteins is represented by the connection between vertices. Due to the restriction of detection methods during experiments, certain parts can not be accurately detected, creating uncertainty. The graph data shows the presence of the edge based on a certain probability. In su