From Collaborative to Privacy-Preserving Sequential Pattern Mining

Research in the areas of privacy-preserving techniques in databases and subsequently in privacy enhancement technologies has witnessed an explosive growth spurt in recent years. This escalation has been fueled primarily by the growing mistrust of individu

  • PDF / 419,202 Bytes
  • 22 Pages / 439.37 x 666.142 pts Page_size
  • 21 Downloads / 216 Views

DOWNLOAD

REPORT


From Collaborative to Privacy-Preserving Sequential Pattern Mining Vishal Kapoor, Pascal Poncelet, Francois Trousset, and Maguelonne Teisseire

Abstract Research in the areas of privacy-preserving techniques in databases and subsequently in privacy enhancement technologies has witnessed an explosive growth spurt in recent years. This escalation has been fueled primarily by the growing mistrust of individuals toward organizations collecting and disbursing their personally identifiable information (PII). Digital repositories have become increasingly susceptible to intentional or unintentional abuse, resulting in organizations to be liable under the privacy legislations that are increasingly being adopted by governments the world over. These privacy concerns have necessitated new advancements in the field of distributed data mining, wherein collaborating parties may be legally bound not to reveal the private information of their customers. In this chapter, first we present the sequential pattern discovery problem in a collaborative framework and subsequently enhance the architecture by introducing the context of privacy. Thus we propose to extract sequential patterns from distributed databases while preserving privacy. A salient feature of the proposal is its flexibility and as a result is more pertinent to mining operations for real-world applications in terms of efficiency and functionality. Furthermore, under some reasonable assumptions, we prove that the architecture and protocol employed by our algorithm for multi-party computation is secure. Finally, we conclude with some trends of current research being conducted in the field.

7.1 Introduction The increasing popularity of multi-database technology, such as communication networks and distributed, federated, and homogeneous multi-database systems, has led to the development of many large distributed transaction databases for real-world applications. However, for the purposes of decision making, large organizations would need to mine these distributed databases located at disparate locations. Moreover, the Web has rapidly transformed into an information flood, where individuals

V. Kapoor (B) Microsoft, One Microsoft Way, Redmond, WA - 98052, USA e-mail: [email protected]

J. Nin, J. Herranz (eds.), Privacy and Anonymity in Information Management Systems, Advanced Information and Knowledge Processing, C Springer-Verlag London Limited 2010 DOI 10.1007/978-1-84996-238-4_7, 

135

136

V. Kapoor et al.

and organizations can access free and accurate information and knowledge on the Internet while making decisions. Although this large data assists in improving the quality of decisions, it also results into a significant challenge of efficiently identifying quality knowledge from multi-databases [27, 33]. Therefore large corporations might have to confront the multiple data source problem. For example, a retail chain with numerous franchisees might wish to collaboratively mine the union of all the transactional data. The individual transactional databases