Incomplete Multi-view Clustering

Real data often consists of multiple views (or representations). By exploiting complementary and consensus grouping information of multiple views, multi-view clustering becomes a successful practice for boosting clustering accuracy in the past decades. Re

  • PDF / 247,695 Bytes
  • 11 Pages / 439.37 x 666.142 pts Page_size
  • 19 Downloads / 192 Views

DOWNLOAD

REPORT


Abstract. Real data often consists of multiple views (or representations). By exploiting complementary and consensus grouping information of multiple views, multi-view clustering becomes a successful practice for boosting clustering accuracy in the past decades. Recently, researchers have begun paying attention to the problem of incomplete view. Generally, they assume at least there is one complete view or only focus on two view problems. However, above assumption is often broken in real tasks. In this work, we propose an IVC algorithm for clustering with more than two incomplete views. Compared with existing works, our proposed algorithm (1) does not require any view to be complete, (2) does not limit the number of incomplete views, and (3) can handle similarity data as well as feature data. The proposed algorithm is based on the spectral graph theory and the kernel alignment principle. By aligning projections of individual views with the projection integration of all views, IVC exchanges the complementary grouping information of incomplete views. Consequently, projections of individual views are made complete and thereby resulting the consensus with accurate grouping information. Experiments on synthetic and real datasets demonstrate the effectiveness of IVC. Keywords: Multi-view clustering Spectral clustering

1

·

Incomplete view clustering

·

Introduction

Many datasets in real world are naturally comprised of heterogeneous views (or representations). Clustering with such type of data is commonly referred to as multi-view Clustering. With the assumption of complementary data representation and consensus decision of clusterings, multi-view clustering has the potential to dramatically increase the learning accuracy over single view clustering [1]. The main problem in multi-view clustering is how to integrate grouping information of individual views. Existing works can be roughly classified into three categories. (1) Multi-kernel learning based approach. The most representative work of this category is Multi-kernel Kmeans [2]. It first uses kernel representation for each c IFIP International Federation for Information Processing 2016  Published by Springer International Publishing AG 2016. All Rights Reserved Z. Shi et al. (Eds.): IIP 2016, IFIP AICT 486, pp. 245–255, 2016. DOI: 10.1007/978-3-319-48390-0 25

246

H. Gao et al.

view, and then it incorporates different views by seeking optimal combination of multiple kernels of different views. (2) Subspace learning based approach. It obtains a latent consensus subspace shared by multiple views and cluster the instances on the latent subspace. There are many research works in this category, including CCA-based methods [3], spectral graph based methods [4–6], matrix factorization based methods [7,8]. (3) Ensemble learning based approach. [9] takes a decision in each individual view separately and then combines all decisions of distinct views to establish a consensus decision by determining cluster agreements/disagreements. Traditional research assumes data are compl