An experimental study of graph-based semi-supervised classification with additional node information

PDF / 903,659 Bytes
35 Pages / 439.37 x 666.142 pts Page_size
44 Downloads / 192 Views

An experimental study of graph-based semi-supervised classification with additional node information Bertrand Lebichot1

· Marco Saerens1

Received: 26 May 2018 / Revised: 21 July 2020 / Accepted: 25 July 2020 / Published online: 9 October 2020 © Springer-Verlag London Ltd., part of Springer Nature 2020

Abstract The volume of data generated by internet and social networks is increasing every day, and there is a clear need for efficient ways of extracting useful information from them. As this information can take different forms, it is important to use all the available data representations for prediction; this is often referred to multi-view learning. In this paper, we consider semi-supervised classification using both regular, plain, tabular, data and structural information coming from a network structure (feature-rich networks). Sixteen techniques are compared and can be divided in three families: the first one uses only the plain features to fit a classification model, the second uses only the network structure, and the last combines both information sources. These three settings are investigated on 10 real-world datasets. Furthermore, network embedding and well-known autocorrelation indicators from spatial statistics are also studied. Possible applications are automatic classification of web pages or other linked documents, of nodes in a social network, or of proteins in a biological complex system, to name a few. Based on our findings, we draw some general conclusions and advice to tackle this particular classification task: it is clearly observed that some dataset labelings can be better explained by their graph structure or by their features set. Keywords Network data analysis · Semi-supervised classification · Link analysis · Graph mining · Multi-view learning

1 Introduction Nowadays, with the increasing volume of data generated, for instance by internet and social networks, there is a need for efficient ways to infer useful information from those networkbased data. Moreover, these data can take several different forms and, in that case, it would be useful to use these alternative views in the prediction model—this is exactly the purpose of multi-view learning [77,86]. In this paper, we focus our attention on supervised classification

B 1

Bertrand Lebichot [email protected] Machine Learning Group – ICTEAM & LSM, Université catholique de Louvain, Place des Doyens 1, 1348 Louvain-la-Neuve, Belgium

123

4338

B. Lebichot, M. Saerens

using both regular tabular data defined on nodes and structural information coming from graphs or networks.1 This kind of data is sometimes called feature-rich networks. Of course, as discussed in [26] (see, e.g., [46] for a survey), many different approaches have been developed for information fusion in machine learning, pattern recognition and applied statistics. This includes [26] simple weighted averages (see, e.g., [15,40]), Bayesian fusion (see, e.g., [15,40]), majority vote (see, e.g., [13,43,47]), models coming from uncertainty reasoning [44] (see, e.g., [

Data Loading...

An experimental study of graph-based semi-supervised classification with additional node information

Recommend Documents

Emotion monitoring with RFID: an experimental study

Lymph Node Classification

Feasibility of preoperative tattooing of percutaneously biopsied axillary lymph node: an experimental pilot study

Few-shot learning with saliency maps as additional visual information

An experimental study of information content measurement of gene ontology terms

An Integrated Research Study of Information Technology (IT) Education and Experimental Design and Execution (EDE) Course

Experimental Study with Paraffin Melting

Blog Classification Using Tags: An Empirical Study

Portuguese as an Additional Language

Development of an Improved Cookstove: An Experimental Study

Additional Information on Emergence of Chalcopyrites as Nonlinear Optical Materials

Learning with Additional Distributions