Homogeneous Pools to Heterogeneous Ensembles for Unsupervised Outlier Detection

A member selection method is presented here for ensembles of unsupervised outlier detection. The key challenge of ensemble construction in unsupervised scenario is the absence of labeled training set. Thus, an alleged outlier set needs to be formed first.

  • PDF / 399,267 Bytes
  • 12 Pages / 439.37 x 666.142 pts Page_size
  • 100 Downloads / 233 Views

DOWNLOAD

REPORT


and Rajeev Kumar

School of Computer and Systems Sciences, Jawaharlal Nehru University, New Delhi, India [email protected]

Abstract. A member selection method is presented here for ensembles of unsupervised outlier detection. The key challenge of ensemble construction in unsupervised scenario is the absence of labeled training set. Thus, an alleged outlier set needs to be formed first. Existing methods construct this set, called as target set, comprising all random detectors at input. Our argument is that such target formation may itself be erroneous as an outcome of such no filtered random results. Hence, complete reliance on it may mislead entire selection process. Herein, we propose an ensemble construction approach HEnS (Heterogeneous Ensemble Selector), which selects members from different sets of homogeneous detectors to build a heterogeneous one. Heterogeneity is ensured to induce diversity to form a good target outlier set, first, by considering those detectors at input which are characteristically distinct, and second, by selecting minimal subset of detectors of each type. Accuracy is maintained, first by pruning the relatively highly inaccurate detectors out of the process, and second, by forming a better target which includes only two extreme parameter instantiations from each type. This work primarily aims at building a heterogeneous ensemble which comprises of best or relatively better detectors from each of the homogeneous groups. Experimental results on benchmark datasets show notably improved prediction accuracy of ensembles constructed using proposed method, which in turn supports our claim of better target construction and enhanced diversity. Keywords: Outlier detection  Outlier ensembles  Ensemble member selection

1 Introduction Outlier detection is finding out abnormal data observations. Unmasking of outliers provides important insights about some unusual system activity, e.g. fraudulent transactions in a banking system. Since outliers are rare data patterns, their detection is highly challenging. Additionally, all outliers can be notably different to each other in characteristics, hence may not be discovered by any single detection method. Therefore, ensemble learning is being used in recent years for outlier detection [1–5]. The primary goal of ensemble learning is to get more robust results with the help of divergent and accurate members. Thus, an ensemble that comprises of randomly chosen members is not preferable, and hence, members are selected first based on some © Springer Nature Singapore Pte Ltd. 2020 C. Badica et al. (Eds.): ICICCT 2020, CCIS 1170, pp. 284–295, 2020. https://doi.org/10.1007/978-981-15-9671-1_25

Homogeneous Pools to Heterogeneous Ensembles

285

criteria before combination. In context of outlier detection, member selection for ensembles is extremely difficult because of unsupervised nature of the problem. Due to the same reason of unavailability for samples with known class labels, member selection methods for classification ensembles are also not applicable her