Practical Issues on Privacy-Preserving Health Data Mining

Privacy-preserving data mining techniques could encourage health data custodians to provide accurate information for mining by ensuring that the data mining procedures and results cannot, with any reasonable degree of certainty, violate data privacy. We o

  • PDF / 208,365 Bytes
  • 12 Pages / 430 x 660 pts Page_size
  • 71 Downloads / 187 Views

DOWNLOAD

REPORT


2

NICTA, Locked Bag 8001, Canberra ACT, 2601, Australia [email protected] RSISE, the Australian National University, Canberra ACT, 2601, Australia

Abstract. Privacy-preserving data mining techniques could encourage health data custodians to provide accurate information for mining by ensuring that the data mining procedures and results cannot, with any reasonable degree of certainty, violate data privacy. We outline privacypreserving data mining techniques/systems in the literature and in industry. They range from privacy-preserving data publishing, privacypreserving (distributed) computation to privacy-preserving data mining result release. We discuss their strength and weaknesses respectively, and indicate there is no perfect technical solution yet. We also provide and discuss a possible development framework for privacy-preserving health data mining systems. Keywords: Data anonymisation, secure multiparty computation, encryption, privacy inference, health data privacy.

1

Introduction

Health information, according to the Australian Commonwealth Privacy Act [1], is defined to be 1. information or an opinion about: (a) the health or a disability (at any time) of an individual; or (b) an individual’s expressed wishes about the future provision of health services to him or her; or (c) a health service provided, or to be provided, to an individual; that is also personal information; or 2. other personal information collected to provide, or in providing, a health service; or 3. other personal information about an individual collected in connection with the donation, or intended donation, by the individual of his or her body parts, organs or body substances. As important personal information, health information is classified as being one type of sensitive information [1]. With the development of powerful data mining tools/systems, we are facing the dilemma that a health data mining system should satisfy user requests for T. Washio et al. (Eds.): PAKDD 2007 Workshops, LNAI 4819, pp. 64–75, 2007. c Springer-Verlag Berlin Heidelberg 2007 

Practical Issues on Privacy-Preserving Health Data Mining

Health database

Data sanitisation De-identification

K-anonymisation



65

Health database

Privacy-preserving computation Secure multiparty computation Crypto-based techniques ...

Privacy-preserving results Result perturbation Result restriction



Privacy-preserving analysis controller

End users

Fig. 1. Illustration of privacy-preserving health data mining system

discovering valuable knowledge from databases [2,3,4,5,6], while guarding against the ability to infer any privacy about individuals. The identification of an individual person or organisation (by the third party) should not be able to be made from mining procedures or results that we release. Furthermore, information attributable to an individual person or organisation should not be disclosed. We should develop policy, procedures as well as new techniques for privacy confidentiality with the aim of meeting legislative obligations. We only concentrate