DC Programming and DCA for Solving Minimum Sum-of-Squares Clustering Using Weighted Dissimilarity Measures
In this paper, we study new efficient approaches based on DC (Difference of Convex functions) programming and DCA (DC Algorithm) for MSSC (Minimum Sum-of-Squares Clustering) using weighted dissimilarity measures. Two most widely used models of MSSC that a
- PDF / 324,999 Bytes
- 19 Pages / 439.363 x 666.131 pts Page_size
- 27 Downloads / 196 Views
Abstract. In this paper, we study new efficient approaches based on DC (Difference of Convex functions) programming and DCA (DC Algorithm) for MSSC (Minimum Sum-of-Squares Clustering) using weighted dissimilarity measures. Two most widely used models of MSSC that are bilevel program and mixed integer program are studied. It turns out that both optimization problems can be reformulated as a DC program and then efficient DCA schemas are developed. Experimental results on real world datasets have illustrated the efficiency of our proposed algorithms and its superiority with respect to standard algorithms in terms of quality of solution. Keywords: Clustering, MSSC, Feature Weighting, Optimization, DC Programming, DCA.
1
Introduction
Clustering (unsupervised classification) is a fundamental problem in unsupervised learning and has many applications in various domains. Clustering is used to partition data into groups of similar elements without advance knowledge of the group definitions. On another hand, nowadays, the growth of technologies leads to exponential augmentation in recorded data in both dimensionality and sample size. In many applications such as e-commerce applications, computational biology, text classification, image analysis,etc. datasets are large volume and contain a large number of features. However, the large number of features can lead to some problems in classification task. Usually, features can be divided into three categories: relevant, redundant and irrelevant features. Relevant features are essential for classification process, redundant features add no new information to the classifier (i.e. information already carried by other features) while irrelevant features do not carry any useful information. The performance of classification algorithms can be significantly degraded if many irrelevant or redundant features are used. Feature selection is one of the techniques to deal with irrelevant or redundant features. Feature selection methods aim to select a subset of features that N.T. Nguyen (Ed.): Transactions on CCI XIII, LNCS 8342, pp. 113–131, 2014. c Springer-Verlag Berlin Heidelberg 2014
114
H.M. Le and M.T. Ta
minimize redundancy while preserving or improving the classification rate of algorithm. Recently, feature weighting has attracted the attention of many researchers. Feature weighting can be seen as an extension of feature selection. In feature selection, a feature is assigned a binary decision variable (value 1 implies that the feature is selected while value 0 means that it will be removed). In feature weighting, each feature is assigned a continuous value, named a weight, in the interval [0, 1]. Relevant features correspond to a high weight value, whereas a weight value close to zero represent irrelevant features. On the contrary to feature selection, the main objective of feature weighting is to improve the quality of classification algorithm, i.e not to reduce the number of features. Feature weighting has been applied successfully in many classification algorithms. Feature weighting in SVM
Data Loading...