Delay-sensitive approaches for anonymizing numerical streaming data
- PDF / 1,165,202 Bytes
- 15 Pages / 595.276 x 790.866 pts Page_size
- 34 Downloads / 184 Views
REGULAR CONTRIBUTION
Delay-sensitive approaches for anonymizing numerical streaming data Hessam Zakerzadeh · Sylvia L. Osborn
Published online: 26 March 2013 © Springer-Verlag Berlin Heidelberg 2013
Abstract Streaming data are widely used in today’s world. Data come from different sources in streams and must be processed online and with minimum delay. These data stream can contain confidential data such as customers’ purchase information and need to be mined in order to reveal other useful information like customers’ purchase patterns. Privacy preservation throughout these processes plays a crucial role. K-anonymity is a well-known technique for preserving privacy. The principle issues in k-anonymity are information loss and running time. Although some of the existing k-anonymity techniques are able to generate anonymized data with acceptable information loss, their main drawback is that they are very time-consuming and are not applicable in a streaming context since streaming data are usually very sensitive to delay and need to be processed quite fast. In [32], we proposed a cluster-based k-anonymity algorithm called fast anonymizing algorithm for numerical streaming data (FAANST) which can anonymize numerical streaming data quite fast while providing an admissible information loss. The main drawback of FAANST is that some tuples may remain in the system for a long time and are output when they might be considered to have expired. In this paper, we propose two extensions for FAANST, passive and proactive solutions. These two solutions put a soft deadline, called delay, on the time each tuple can stay in the system, and if a tuple passes this deadline, these algorithms force Most of this research was done when the author was a M.Sc student at the University of Western Ontario. H. Zakerzadeh (B) · S. L. Osborn Department of Computer Science, University of Calgary, 2500 University Dr. NW Calgary, Alberta T2N 1N4, Canada e-mail: [email protected] S. L. Osborn e-mail: [email protected]
the tuple to be output. The proactive solution goes even one step further and utilizes a simple heuristic function to predict when a tuple in the system may expire and outputs the tuple if it will expire in the next round of the algorithm’s execution. Keywords K-anonymity · Privacy-preserving data mining · Streaming data
1 Introduction Streaming data are being widely used in today’s world. Data from different sources come in streams and are required to be processed online and with minimum delay. Examples of applications using data streams are financial applications, network monitoring applications, security applications, telecommunications data management applications, web applications, manufacturing applications, and others. Streams from these applications need to be mined in order to obtain important information. As a case in point, online trading companies such as Questrade.com receive and record tens of thousands of online bids issued by their customers every day. Suppose the bids made by bidders constitute a transaction stream
Data Loading...