DeepECT: The Deep Embedded Cluster Tree

  • PDF / 2,223,981 Bytes
  • 14 Pages / 595.276 x 790.866 pts Page_size
  • 71 Downloads / 204 Views

DOWNLOAD

REPORT


DeepECT: The Deep Embedded Cluster Tree Dominik Mautz1   · Claudia Plant2 · Christian Böhm3 Received: 2 April 2020 / Revised: 26 June 2020 / Accepted: 26 June 2020 © The Author(s) 2020

Abstract The idea of combining the high representational power of deep learning techniques with clustering methods has gained much attention in recent years. Optimizing a clustering objective and the dataset representation simultaneously has been shown to be advantageous over separately optimizing them. So far, however, all proposed methods have been using a flat clustering strategy, with the actual number of clusters known a priori. In this paper, we propose the Deep Embedded Cluster Tree (DeepECT), the first divisive hierarchical embedded clustering method. The cluster tree does not need to know the actual number of clusters during optimization. Instead, the level of detail to be analyzed can be chosen afterward and for each sub-tree separately. An optional data-augmentation-based extension allows DeepECT to ignore prior-known invariances of the dataset, such as affine transformations in image data. We evaluate and show the advantages of DeepECT in extensive experiments. Keywords  Embedded clustering · Hierarchical clustering · Autoencoder · Deep learning Abbreviations ACC​ Clustering accuracy AE Autoencoder AE + Complete AE combined with agglomerative clustering with complete-linkage AE + Single AE combined with agglomerative clustering with single-linkage DEC Deep Embedded Cluster algorithm [3] IDEC Improved Deep Embedded Cluster algorithm [5] DeepECT Deep Embedded Cluster Tree DeepECT + Aug DeepECT with the optional augmentation extension DP Dendrogram purity Eq. Equation LP Leaf purity * Dominik Mautz [email protected] Claudia Plant [email protected] Christian Böhm [email protected] 1



LMU München, Munich, Germany

2



Faculty of Computer Science, ds:UniVie, University of Vienna, Vienna, Austria

3

MCML, LMU München, Munich, Germany



NMI Normalized mutual information ReLU Rectified linear unit URL Uniform resource locator

1 Introduction Clustering algorithms are a fundamental tool for data mining tasks. However, of similar importance is the representation of the data to be clustered and this, in turn, depends on the data domain. In the last decade, deep learning techniques have achieved in areas that were previously very challenging for machine learning and data mining methods. These areas include images, graph structures, text, video, and audio. Many of these success stories have been made in the context of supervised learning. Further, neural network-based, unsupervised representation learning has made it possible to embed these challenging domains into spaces more accessible to classical data mining methods. In recent years, the idea of simultaneously optimizing a clustering objective and the dataset representation has gained more traction. In this work, we call these methods either embedded clustering or deep clustering. The combined optimization holds the promise of improved results