K-means tree: an optimal clustering tree for unsupervised learning
- PDF / 2,640,101 Bytes
- 28 Pages / 439.37 x 666.142 pts Page_size
- 21 Downloads / 270 Views
K‑means tree: an optimal clustering tree for unsupervised learning Pooya Tavallali1 · Peyman Tavallali2 · Mukesh Singhal1 Accepted: 14 September 2020 © Springer Science+Business Media, LLC, part of Springer Nature 2020
Abstract Tree construction is one of the popular methods for tackling any supervised task in machine learning. However, there has been little effort in applying trees for unsupervised tasks. The traditional unsupervised trees are based on recursively partitioning the space such that the achieved partitions contain similar samples. Sense of similarity depends on the models and applications. This paper tackles the issue of learning optimal clustering oblique trees for the first time and proposes a linear time algorithm for training it. Optimizing performance of infrastructures and energy consumption in the field of Internet of things can be mentioned as applications of tree and clustering, respectively. The motivation of unsupervised tree models is to preserve the data manifold, while keeping the query, while keeping the query time fast. Popular unsupervised models consist of k-d trees, random projection (RP trees), principal component analysis trees (PCA trees) and clustering trees. However, all existing methods for unsupervised tree are sub-optimal. Additionally, existing clustering trees are limited to axis-aligned trees. Further, some of the mentioned methods suffer from curse of dimensionality such as k-d trees. Despite the mentioned challenges, trees are fast in query time. On the other hand, a non-hierarchical clustering such as k-means has both: It performs well in high-dimensional problems and is locally optimal. Its learning algorithm is efficient. However, k-means clustering is not fast in query time. To address the mentioned issues, this paper proposes a novel k-means tree, a tree that outputs the centroids of clusters. The advantages of such tree are being fast in query time and also learning as good cluster centroids as k-means. As a result, problem of learning such trees is to learn both centroids and the tree parameters optimally and jointly. In this paper, this problem is first cast as a constrained minimization problem and then solved using quadratic penalty method. The method consists of learning clusters from k-means and gradually adapting centroids to the outputs of an optimal oblique tree. The alternating optimization is used, and alternation steps consist of weighted k-means clustering and tree optimization. Additionally, the training complexity of proposed algorithm is efficient. Proposed algorithm is optimal in the sense of learned clusters and tree jointly. Trees used in the k-means tree are oblique, and as per our knowledge, this is the first time that Extended author information available on the last page of the article
13
Vol.:(0123456789)
P. Tavallali et al.
oblique trees are applied to the task of clustering. As a side product of the proposed method, sample reduction is explored and shown its merits. It is shown that computational complexity of training KMT (K-mea
Data Loading...