Convex clustering method for compositional data modeling
- PDF / 2,064,937 Bytes
- 16 Pages / 595.276 x 790.866 pts Page_size
- 47 Downloads / 213 Views
METHODOLOGIES AND APPLICATION
Convex clustering method for compositional data modeling Xiaokang Wang1 · Huiwen Wang1,2 · Zhichao Wang3 · Jidong Yuan4
© Springer-Verlag GmbH Germany, part of Springer Nature 2020
Abstract Compositional data refer to a vector with parts that are positive and subject to a constant-sum constraint. Examples of compositional data in the real world include a vector with each entry representing the weight of a stock in an investment portfolio, or the relative concentration of air pollutants in the environment. In this study, we developed a Convex Clustering approach for grouping Compositional data. Convex clustering is desirable because it provides a global optimal solution given its convex relaxations of hierarchical clustering. However, when directly applied to compositions, the clustering result offers little interpretability because it ignores the unit-sum constraint of compositional data. In this study, we discuss the clustering of compositional variables in the Aitchison framework with an isometric log-ratio (ilr) transformation. The objective optimization function is formulated as a combination of a L 2 -norm loss term and a L 1 -norm regularization term and is then efficiently solved using the alternating direction method of multipliers. Based on the numerical simulation results, the accuracy of clustering ilr-transformed data is higher than the accuracy of directly clustering untransformed compositional data. To demonstrate its practical use in real applications, the proposed method is also tested on several real-world datasets. Keywords Compositional data analysis · Aitchison geometry · Convex clustering · Alternating direction method of multipliers (ADMM)
1 Introduction We are currently surrounded by massive numbers of sensor networks (Zhang et al. 2014, 2015), power systems (Zhang and Zhang 2012), transportation systems Duan et al. (2018), and communication networks (Zhang et al. 2018; Liu et al. 2019) generating significant amounts of data with different characteristics. In this study, we are mainly interested in one particular type of data, compositional data, which are usually expressed in proportions or percentages. Compositional data convey structural information that quantitatively Communicated by V. Loia.
B
Zhichao Wang [email protected]
1
School of Economics and Management, Beihang University, Beijing, China
2
Beijing Advanced Innovation Center for Big Data and Brain Computing, Beijing, China
3
Postdoctoral Research Center, Industrial and Commercial Bank of China, Beijing, China
4
School of Computer and Information Technology, Beijing Jiaotong University, Beijing, China
refers to parts of a whole, carrying only relative information (Aitchison 1982, 1986). As proportions are expressed as real numbers, if one interprets or analyzes them in the raw form, without adequately considering the unit-sum constraint, it can lead to misinterpretations and false conclusions (Templ et al. 2013). Clustering analysis is a type of unsupervised learning approach for st
Data Loading...