A competitive optimization approach for data clustering and orthogonal non-negative matrix factorization

PDF / 1,280,262 Bytes
27 Pages / 439.37 x 666.142 pts Page_size
99 Downloads / 225 Views

A competitive optimization approach for data clustering and orthogonal non-negative matrix factorization Ja’far Dehghanpour-Sahron1 · Nezam Mahdavi-Amiri1 Received: 3 December 2019 / Revised: 31 March 2020 © Springer-Verlag GmbH Germany, part of Springer Nature 2020

Abstract Partitioning a given data-set into subsets based on similarity among the data is called clustering. Clustering is a major task in data mining and machine learning having many applications such as text retrieval, pattern recognition, and web mining. Here, we briefly review some clustering related problems (k-means, normalized k-cut, orthogonal non-negative matrix factorization, ONMF, and isoperimetry) and describe their connections. We formulate the relaxed mean version of the isoperimetry problem as an optimization problem with non-negative orthogonal constraints. We first make use of a gradient-based optimization algorithm to solve this kind of a problem, and then apply a post-processing technique to extract a solution of the clustering problem. Also, we propose a simplified approach to improve upon solution of the 2-dimensional clustering problem, using the N -nearest neighbor graph. Inspired by this technique, we apply a multilevel method for clustering a given data-set to reduce the size of the problem by grouping a number of similar vertices. The number is determined based on two values, namely, the maximum and the average of the edge weights of the vertices connected to a selected vertex. In addition, using the connections between ONMF and k-means and between k-means and the isoperimetry problem, we propose an algorithm to solve the ONMF problem. A comparative performance analysis of our approach with other related methods shows outperformance of our approach, in terms of the obtained misclassification error rate and Rand index, on both benchmark and randomly generated problems as well as hard synthetic data-sets. Keywords Clustering · Multilevel method · Normalized k-cut · Optimization problem · Orthogonal non-negative matrix factorization Mathematics Subject Classification 65K10 · 90C27

B

Nezam Mahdavi-Amiri [email protected] Ja’far Dehghanpour-Sahron [email protected]

1

Faculty of Mathematical Sciences, Sharif University of Technology, P. O. Box 11155-9415, Tehran, Iran

123

J. Dehghanpour-Sahron, N. Mahdavi-Amiri

1 Introduction For a given data-set containing n points in Rd or a matrix in Rd×n (where we consider each column of the matrix as a point in Rd ) and a given k, 1 ≤ k ≤ n, the number of clusters, we are to cluster the data-set into k clusters such that points in the same cluster are more similar to each other as compared to points in other clusters. Based on the similarity measure, various objective functions have been defined for clustering, and a variety of problems such as k-means, normalized k-cut, etc., have been devised. Partitioning problems such as k-means and normalized cut problems, despite differences in their cost functions, have a common feature: each point must be assigned to exactly one clust

Data Loading...

A competitive optimization approach for data clustering and orthogonal non-negative matrix factorization

Recommend Documents

Randomized Algorithms for Orthogonal Nonnegative Matrix Factorization

Nonparametric Bayesian Nonnegative Matrix Factorization

Nonnegative Residual Matrix Factorization for Community Detection

Bayesian mean-parameterized nonnegative binary matrix factorization

Dual local learning regularized nonnegative matrix factorization and its semi-supervised extension for clustering

Gyral Growth Patterns of Macaque Brains Revealed by Scattered Orthogonal Nonnegative Matrix Factorization

Semantic Feature Extraction for Brain CT Image Clustering Using Nonnegative Matrix Factorization

Nonnegative matrix factorization with manifold regularization and maximum discriminant information

Element-Wise Alternating Least Squares Algorithm for Nonnegative Matrix Factorization on One-Hot Encoded Data

Deep semi-nonnegative matrix factorization with elastic preserving for data representation

MHSNMF: multi-view hessian regularization based symmetric nonnegative matrix factorization for microbiome data analysis

Matrix factorization of large scale data using multistage matrix factorization