Convex clustering method for compositional data modeling

PDF / 2,064,937 Bytes
16 Pages / 595.276 x 790.866 pts Page_size
47 Downloads / 220 Views

METHODOLOGIES AND APPLICATION

Convex clustering method for compositional data modeling Xiaokang Wang1 · Huiwen Wang1,2 · Zhichao Wang3 · Jidong Yuan4

© Springer-Verlag GmbH Germany, part of Springer Nature 2020

Abstract Compositional data refer to a vector with parts that are positive and subject to a constant-sum constraint. Examples of compositional data in the real world include a vector with each entry representing the weight of a stock in an investment portfolio, or the relative concentration of air pollutants in the environment. In this study, we developed a Convex Clustering approach for grouping Compositional data. Convex clustering is desirable because it provides a global optimal solution given its convex relaxations of hierarchical clustering. However, when directly applied to compositions, the clustering result offers little interpretability because it ignores the unit-sum constraint of compositional data. In this study, we discuss the clustering of compositional variables in the Aitchison framework with an isometric log-ratio (ilr) transformation. The objective optimization function is formulated as a combination of a L 2 -norm loss term and a L 1 -norm regularization term and is then efficiently solved using the alternating direction method of multipliers. Based on the numerical simulation results, the accuracy of clustering ilr-transformed data is higher than the accuracy of directly clustering untransformed compositional data. To demonstrate its practical use in real applications, the proposed method is also tested on several real-world datasets. Keywords Compositional data analysis · Aitchison geometry · Convex clustering · Alternating direction method of multipliers (ADMM)

1 Introduction We are currently surrounded by massive numbers of sensor networks (Zhang et al. 2014, 2015), power systems (Zhang and Zhang 2012), transportation systems Duan et al. (2018), and communication networks (Zhang et al. 2018; Liu et al. 2019) generating significant amounts of data with different characteristics. In this study, we are mainly interested in one particular type of data, compositional data, which are usually expressed in proportions or percentages. Compositional data convey structural information that quantitatively Communicated by V. Loia.

B

Zhichao Wang [email protected]

1

School of Economics and Management, Beihang University, Beijing, China

2

Beijing Advanced Innovation Center for Big Data and Brain Computing, Beijing, China

3

Postdoctoral Research Center, Industrial and Commercial Bank of China, Beijing, China

4

School of Computer and Information Technology, Beijing Jiaotong University, Beijing, China

refers to parts of a whole, carrying only relative information (Aitchison 1982, 1986). As proportions are expressed as real numbers, if one interprets or analyzes them in the raw form, without adequately considering the unit-sum constraint, it can lead to misinterpretations and false conclusions (Templ et al. 2013). Clustering analysis is a type of unsupervised learning approach for st

Data Loading...

Convex clustering method for compositional data modeling

Recommend Documents

A Non-stochastic Method for Clustering of Big Genomic Data

Data Clustering

Multivariate functional data modeling with time-varying clustering

Analyzing Compositional Data with R

Semi-supervised Learning of Database Annotated Data Clustering Method

Imbalanced Data Classification Method Based on Clustering and Voting Mechanism

Clustering Imputation for Air Pollution Data

Big Data and Clustering

Learning Object Placement by Inpainting for Compositional Data Augmentation

An Iterative Convex Programming Method for Rocket Landing Trajectory Optimization

A Convex Optimization Based Method for Color Image Reconstruction

Near-Optimal Hyperfast Second-Order Method for Convex Optimization