Tensor latent block model for co-clustering

  • PDF / 2,404,989 Bytes
  • 15 Pages / 595.276 x 790.866 pts Page_size
  • 46 Downloads / 224 Views

DOWNLOAD

REPORT


REGULAR PAPER

Tensor latent block model for co-clustering Rafika Boutalbi1

· Lazhar Labiod1 · Mohamed Nadif1

Received: 10 July 2019 / Accepted: 12 October 2019 © Springer Nature Switzerland AG 2020

Abstract With the exponential growth of collected data in different fields like recommender system (user, items), text mining (document, term), bioinformatics (individual, gene), co-clustering, which is a simultaneous clustering of both dimensions of a data matrix, has become a popular technique. Co-clustering aims to obtain homogeneous blocks leading to a straightforward simultaneous interpretation of row clusters and column clusters. Many approaches exist; in this paper, we rely on the latent block model (LBM), which is flexible, allowing to model different types of data matrices. We extend its use to the case of a tensor (3D matrix) data in proposing a Tensor LBM (TLBM), allowing different relations between entities. To show the interest of TLBM, we consider continuous, binary, and contingency tables datasets. To estimate the parameters, a variational EM algorithm is developed. Its performances are evaluated on synthetic and real datasets to highlight different possible applications. Keywords Co-clustering · Tensor · Data science

1 Introduction Co-clustering addresses the problem of simultaneous clustering of both dimensions of a data matrix. Many of the datasets encountered in data science are two-dimensional in nature and can be represented by a matrix. Classical clustering procedures seek to construct separately an optimal partition of rows (individuals) or, sometimes (features), of columns. In contrast, co-clustering methods cluster the rows and the columns simultaneously and organize the data into homogeneous blocks (after suitable permutations); see, for instance, [1,4,11,15,16,18,19,25,26,30,31]. Methods of this kind have practical importance in a wide variety of applications where data are typically organized in two-way tables. However, in modern datasets, instead of collecting data on

This submission is an extension version of the PAKDD 2019 paper ’Co-clustering from Tensor Data’.

B

Rafika Boutalbi [email protected] Lazhar Labiod [email protected] Mohamed Nadif [email protected]

1

every individual-feature pair, we may collect supplementary individual or item information leading to tensor representation. This kind of data has emerged in many fields such as recommender systems where the data are collected on multiple items rated by multiple users; information about users and items is also available yielding as a tensor rather than a data matrix. Despite the great interest for co-clustering techniques, on the one hand, and the tensor representation, on the other, few works tackle co-clustering from tensor data. We mention the work based on minimum Bregman information (MBI) to carry out co-clustering [3] and the general tensor spectral coclustering (GTSC) method suitable for nonnegative tensor data [35]. Other approaches can be cited although the goal is not exactly co-clustering