The Right to Be Forgotten: Towards Machine Learning on Perturbed Knowledge Bases

Today’s increasingly complex information infrastructures represent the basis of any data-driven industries which are rapidly becoming the 21st century’s economic backbone. The sensitivity of those infrastructures to disturbances in their knowledge bases i

PDF / 2,911,305 Bytes
16 Pages / 439.37 x 666.142 pts Page_size
54 Downloads / 227 Views

DOWNLOAD

REPORT

Holzinger Group HCI-KDD, Institute for Medical Informatics, Statistics and Documentation, Medical University Graz, Graz, Austria {b.malle,a.holzinger}@hci-kdd.org 2 SBA Research gGmbH, Favoritenstrae 16, 1040 Vienna, Austria [email protected]

Abstract. Today’s increasingly complex information infrastructures represent the basis of any data-driven industries which are rapidly becoming the 21st century’s economic backbone. The sensitivity of those infrastructures to disturbances in their knowledge bases is therefore of crucial interest for companies, organizations, customers and regulating bodies. This holds true with respect to the direct provisioning of such information in crucial applications like clinical settings or the energy industry, but also when considering additional insights, predictions and personalized services that are enabled by the automatic processing of those data. In the light of new EU Data Protection regulations applying from 2018 onwards which give customers the right to have their data deleted on request, information processing bodies will have to react to these changing jurisdictional (and therefore economic) conditions. Their choices include a re-design of their data infrastructure as well as preventive actions like anonymization of databases per default. Therefore, insights into the eﬀects of perturbed/anonymized knowledge bases on the quality of machine learning results are a crucial basis for successfully facing those future challenges. In this paper we introduce a series of experiments we conducted on applying four diﬀerent classiﬁers to an established dataset, as well as several distorted versions of it and present our initial results. Keywords: Machine learning · Knowledge bases · Right to be forgotten · Perturbation · Anonymization · k-anonymity · SaNGreeA · Information loss · Structural loss · Cost weighing vector · Interactive machine learning

1

Introduction and Motivation for Research

Privacy aware machine learning [6] is an issue of increasing importance, fostered by anonymization concepts like k-anonymity [14], in which a record is c IFIP International Federation for Information Processing 2016 Published by Springer International Publishing Switzerland 2016. All Rights Reserved F. Buccafurri et al. (Eds.): CD-ARES 2016, LNCS 9817, pp. 251–266, 2016. DOI: 10.1007/978-3-319-45507-5 17

252

B. Malle et al.

released only if it is indistinguishable from k other entities in the data set. However, k-anonymity is highly dependent on spatial locality in order to eﬀectively implement the technique in a statistically robust way, and in arbitrarily high dimensions data becomes sparse, hence, the concept of spatial locality is not easy to deﬁne. Consequently, it becomes diﬃcult to anonymize the data without an unacceptably high amount of information loss [1]. Therefore, the problem of kanonymization is on the one hand NP-hard, on the other hand the quality of the result obtained can be measured at the given factors: k-anonymity means that attributes are suppressed or genera

Data Loading...

The Right to Be Forgotten: Towards Machine Learning on Perturbed Knowledge Bases

Recommend Documents

The Right to Be Forgotten

The Right To Be Forgotten A Comparative Study of the Emergent Ri

Have You Forgotten? A Method to Assess if Machine Learning Models Have Forgotten Data

Scientific Knowledge Bases

Knowledge Bases Synthesis

Towards Analogy-Based Explanations in Machine Learning

On Machine Learning Approach Towards Sorting Permutations by Block Transpositions

Towards Formal Fairness in Machine Learning

Machine Learning Approach Towards Satellite Image Classification

Stable Machine Learning Knowledge Map Domain Analysis

Visual Knowledge Discovery and Machine Learning

Data Analysis, Machine Learning and Knowledge Discovery