Anomaly detection with inexact labels

  • PDF / 2,071,122 Bytes
  • 17 Pages / 439.37 x 666.142 pts Page_size
  • 13 Downloads / 292 Views

DOWNLOAD

REPORT


Anomaly detection with inexact labels Tomoharu Iwata1   · Machiko Toyoda2 · Shotaro Tora2 · Naonori Ueda1 Received: 18 September 2019 / Revised: 12 April 2020 / Accepted: 24 April 2020 © The Author(s), under exclusive licence to Springer Science+Business Media LLC, part of Springer Nature 2020

Abstract We propose a supervised anomaly detection method for data with inexact anomaly labels, where each label, which is assigned to a set of instances, indicates that at least one instance in the set is anomalous. Although many anomaly detection methods have been proposed, they cannot handle inexact anomaly labels. To measure the performance with inexact anomaly labels, we define the inexact AUC, which is our extension of the area under the ROC curve (AUC) for inexact labels. The proposed method trains an anomaly score function so that the smooth approximation of the inexact AUC increases while anomaly scores for non-anomalous instances become low. We model the anomaly score function by a neural network-based unsupervised anomaly detection method, e.g., autoencoders. The proposed method performs well even when only a small number of inexact labels are available by incorporating an unsupervised anomaly detection mechanism with inexact AUC maximization. Using various datasets, we experimentally demonstrate that our proposed method improves the anomaly detection performance with inexact anomaly labels, and outperforms existing unsupervised and supervised anomaly detection and multiple instance learning methods. Keywords  Anomaly detection · Inexact labels · AUC maximization

1 Introduction Anomaly detection is an important machine learning task, which is a task to find the anomalous instances in a dataset. Anomaly detection has been used in a wide variety of applications (Chandola et  al. 2009; Patcha and Park 2007; Hodge and Austin 2004), such as network intrusion detection for cyber-security (Dokas et al. 2002; Yamanishi et al. 2004), fraud detection for credit cards (Aleskerov et  al. 1997), defect detection in industrial machines (Fujimaki et  al. 2005; Idé and Kashima 2004) and disease outbreak detection (Wong et al. 2003). Editor: Jesse Davis. * Tomoharu Iwata [email protected] 1

NTT Communication Science Laboratories, Kyoto, Japan

2

NTT Software Innovation Center, Tokyo, Japan



13

Vol.:(0123456789)



Machine Learning

Many unsupervised anomaly detection methods have been proposed (Breunig et  al. 2000; Schölkopf et  al. 2001; Liu et  al. 2008; Sakurada and Yairi 2014). When anomaly labels, which indicate whether each instance is anomalous, are given, the anomaly detection performance can be improved (Singh and Silakari 2009; Mukkamala et  al. 2005; Rapaka et al. 2003; Nadeem et al. 2016; Gao et al. 2006; Das et al. 2016, 2017). However, it is difficult to attach exact anomaly labels in some situations. Consider one such example from server system failure detection, where server logs at each timestep is an instance, and we want to classify each instance into anomalous (system failure) or non-anom

Data Loading...