Comparison of Non-negative Matrix Factorization Methods for Clustering Genomic Data
Non-negative matrix factorization (NMF) is a useful method of data dimensionality reduction and has been widely used in many fields, such as pattern recognition and data mining. Compared with other traditional methods, it has unique advantages. And more a
- PDF / 190,101 Bytes
- 10 Pages / 439.37 x 666.142 pts Page_size
- 65 Downloads / 153 Views
2
3
School of Information Science and Engineering, Qufu Normal University, Rizhao 276826, China {mixiaohou,shangjunliang110}@163.com, {sdcavell,zhengch99}@126.com Library of Qufu Normal University, Qufu Normal University, Rizhao 276826, China [email protected] Shenzhen Graduate School, Bio-Computing Research Center, Harbin Institute of Technology, Shenzhen 518055, China
Abstract. Non-negative matrix factorization (NMF) is a useful method of data dimensionality reduction and has been widely used in many fields, such as pattern recognition and data mining. Compared with other traditional methods, it has unique advantages. And more and more improved NMF methods have been provided in recent years and all of these methods have merits and demerits when used in different applications. Clustering based on NMF methods is a common way to reflect the properties of methods. While there are no special comparisons of clustering experiments based on NMF methods on genomic data. In this paper, we analyze the characteristics of basic NMF and its classical variant methods. Moreover, we show the clustering results based on the coefficient matrix decomposed by NMF methods on the genomic datasets. We also compare the clustering accuracies and the cost of time of these methods. Keywords: Non-negative matrix factorization Dimensionality reduction
Clustering Genomic
data
1 Introduction With human’s entering the era of big data, massive and high-dimensional data seem to be generated continuously. It is a challenge to reduce the dimensionality of high-dimensional data to achieve the purpose of storing, processing and reconstructing in machine learning and data mining. There are numerous traditional methods to reduce the dimensionality of data, such as Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA). These methods allow for the existence of negative, which is not applicable in some cases. And they adopted linear dimensionality reduction technology that is not conducive to retaining characteristics of data. As a novel matrix factorization method, NMF overcomes many problems of the traditional © Springer International Publishing Switzerland 2016 D.-S. Huang and K.-H. Jo (Eds.): ICIC 2016, Part II, LNCS 9772, pp. 290–299, 2016. DOI: 10.1007/978-3-319-42294-7_25
Comparison of Non-negative Matrix Factorization Methods
291
matrix factorization method and provides a deeper view of the data. NMF can obtain two non-negative matrices to approximate the original data matrix, which reflects the concept of part-based representation in human thought. NMF method can get the local expression of high-dimensional data by dimensionality reduction. It has been successfully used in bioinformatics, such as genome sequence feature recognition, local feature recognition, biological literature mining. In recent years, many scholars utilized NMF methods to do clustering experiments, such as document clustering, image clustering, tumor clustering. But there are no clearly comparisons of clustering experiments on genomic data,
Data Loading...