Interactive clustering: a scoping review
- PDF / 2,887,144 Bytes
- 62 Pages / 439.37 x 666.142 pts Page_size
- 42 Downloads / 256 Views
Interactive clustering: a scoping review Thais Rodrigues Neubauer1 · Sarajane Marques Peres1 · Marcelo Fantinato1 · Xixi Lu2 · Hajo Alexander Reijers2
© Springer Nature B.V. 2020
Abstract We present in this paper a scoping review conducted in the interactive clustering area. Interactive clustering has been applied to leverage the strengths of both unsupervised and supervised learning. In interactive clustering, supervised learning is represented by inserting the knowledge of human experts in an originally unsupervised data analysis process. This scoping review aimed to organize the knowledge on (i) the applicability of interactive clustering methods, (ii) clustering algorithms being used to support interactive clustering, (iii) how to model the expert supervision and (iv) the effects brought by the expert supervision in the results produced. A systematic search for related literature was conducted in the Scopus database, resulting in the selection of 50 primary studies published by 2018. The analysis of these studies allowed us to identify trends such as: the application in text/ image; use of partitioning and hierarchical algorithms; application of strategies based on split/merge, pairwise constraints, similarity metrics learning and data reassignment; and concern with visualization. In addition, some relevant issues not yet adequately addressed were identified, such as: the evaluation of expert supervision; the evaluation of the expert’s effort; and the conduction of studies effectively involving human experts, instead of computer simulations. Keywords Interactive clustering · Active learning · Human-in-the-loop · Clustering · Expert supervision · User supervision
* Thais Rodrigues Neubauer [email protected] Sarajane Marques Peres [email protected] Marcelo Fantinato [email protected] Xixi Lu [email protected] Hajo Alexander Reijers [email protected] 1
University of São Paulo, São Paulo, Brazil
2
Utrecht University, Utrecht, The Netherlands
13
Vol.:(0123456789)
T. R. Neubauer et al.
1 Introduction Interactive clustering is a data analysis approach that includes a human expert in key decisions of the clustering process (Hu et al. 2014; Schwenker and Trentin 2014). Including an expert in the loop of unsupervised data analysis aims to achieve higher quality results or results that are aligned with specific needs of a particular user or scenario. The difficulty in obtaining quality or compliance in data analysis results in the unsupervised analysis context is usually due to the assumptions or arbitrary decisions made during the parameterization of the clustering algorithms. These assumptions and arbitrary decisions may not correspond to the actual distributions of data under analysis. In addition, they rarely map all the relevant information that the data represent in the scope of the real problem on which they are generated (Lei et al. 2017). Although the term interactive clustering has only recently become more popular, one of the first scientific works to address the systematic insertion of human kno
Data Loading...