Learning representations from dendrograms
- PDF / 1,231,887 Bytes
- 24 Pages / 439.37 x 666.142 pts Page_size
- 84 Downloads / 220 Views
Learning representations from dendrograms Morteza Haghir Chehreghani1 · Mostafa Haghir Chehreghani2 Received: 11 November 2019 / Revised: 11 May 2020 / Accepted: 6 July 2020 © The Author(s) 2020
Abstract We propose unsupervised representation learning and feature extraction from dendrograms. The commonly used Minimax distance measures correspond to building a dendrogram with single linkage criterion, with defining specific forms of a level function and a distance function over that. Therefore, we extend this method to arbitrary dendrograms. We develop a generalized framework wherein different distance measures and representations can be inferred from different types of dendrograms, level functions and distance functions. Via an appropriate embedding, we compute a vector-based representation of the inferred distances, in order to enable many numerical machine learning algorithms to employ such distances. Then, to address the model selection problem, we study the aggregation of different dendrogram-based distances respectively in solution space and in representation space in the spirit of deep representations. In the first approach, for example for the clustering problem, we build a graph with positive and negative edge weights according to the consistency of the clustering labels of different objects among different solutions, in the context of ensemble methods. Then, we use an efficient variant of correlation clustering to produce the final clusters. In the second approach, we investigate the combination of different distances and features sequentially in the spirit of multi-layered architectures to obtain the final features. Finally, we demonstrate the effectiveness of our approach via several numerical studies. Keywords Representation learning · Unsupervised learning · Ensemble method · Feature extraction · Dendrogram
Editors: Ira Assent, Carlotta Domeniconi, Aristides Gionis, Eyke Hüllermeier. * Morteza Haghir Chehreghani [email protected] Mostafa Haghir Chehreghani [email protected] 1
Department of Computer Science and Engineering, Chalmers University of Technology, Gothenburg, Sweden
2
Department of Computer Engineering, Amirkabir University of Technology (Tehran Polytechnic), Tehran, Iran
13
Vol.:(0123456789)
Machine Learning
1 Introduction Real-world datasets often consist of complex and a priori unknown patterns and structures, requiring to improve the basic representation. Kernel methods are commonly used for this purpose (Hofmann et al. 2008; Shawe-Taylor and Cristianini 2004). However, their applicability is confined by several limitations (von Luxburg 2007; Nadler and Galun 2007; Chehreghani 2017b). (1) Finding the optimal parameter(s) of a kernel function is often nontrivial, in particular in an unsupervised learning task such as clustering where no labeled data is available for cross-validation. (2) The proper values of the parameters usually occur inside a very narrow range that makes cross-validation critical, even in presence of labeled data. To overcome
Data Loading...