Retrieval Models for Text Databases

  • PDF / 3,796,224 Bytes
  • 144 Pages / 547.087 x 737.008 pts Page_size
  • 0 Downloads / 175 Views

DOWNLOAD

REPORT


Random Access Memory (RAM) ▶ Main Memory

Randomization Methods to Ensure Data Privacy A SHWIN M ACHANAVAJJHALA , J OHANNES G EHRKE Cornell University, Ithaca, NY, USA

Synonyms Perturbation techniques

Definition Many organizations, e.g., government statistical offices and search engine companies, collect potentially sensitive information regarding individuals either to publish this data for research, or in return for useful services. While some data collection organizations, like the census, are legally required not to breach the privacy of the individuals, other data collection organizations may not be trusted to uphold privacy. Hence, if U denotes the original data containing sensitive information about a set of individuals, then an untrusted data collector or researcher should only have access to an anonymized version of the data, U *, that does not disclose the sensitive information about the individuals. A randomized anonymization algorithm R is said to be a privacy preserving randomization method if for every table T, and for every output T * = R(T), the privacy of all the sensitive information of each individual in the original data is provably guaranteed. #

2009 Springer ScienceþBusiness Media, LLC

Historical Background This is a brief survey of state of the art randomization methods. The reader is referred to the classical survey by Adam and Wortman [1] for a more comprehensive description of older randomization techniques. Existing literature can be classified based on whether the individuals sharing the data trust the data collector or not. The untrusted data collector scenario is discussed first. Randomization methods have been historically used to elicit accurate answers to surveys of sensitive yes/no questions. Respondents may be reluctant to answer such questions truthfully when the data collector (surveyor) is untrusted. Warner’s classical paper on randomized response [17] proposed a simple technique, where each individual i independently randomized the answer as follows: i answers truthfully with probability pi, and lies with probability (1  pi). Randomized response intuitively ensures privacy since no individual reports the true value. However, Warner did not formalize this intuition. Subsequent works [7,2] generalized the above randomized response technique to other domains. Evfimievski et al. [7] studied the problem where individuals share itemsets (e.g., a set of movies rented) with an untrusted server (e.g., an online movie rental company) in return for services (e.g., movie recommendations), and they proposed a formal definition of privacy breaches. They invented a provably private randomization technique where users submit independently randomized itemsets to the server. They also proposed data reconstruction algorithms to help the server mine association rules from these randomized itemsets, and experimentally illustrated the accuracy of the reconstruction techniques. The above methods are called local randomization techniques since every individual perturbs his/her data locally before s