Delay-sensitive approaches for anonymizing numerical streaming data

PDF / 1,165,202 Bytes
15 Pages / 595.276 x 790.866 pts Page_size
34 Downloads / 205 Views

REGULAR CONTRIBUTION

Delay-sensitive approaches for anonymizing numerical streaming data Hessam Zakerzadeh · Sylvia L. Osborn

Published online: 26 March 2013 © Springer-Verlag Berlin Heidelberg 2013

Abstract Streaming data are widely used in today’s world. Data come from different sources in streams and must be processed online and with minimum delay. These data stream can contain confidential data such as customers’ purchase information and need to be mined in order to reveal other useful information like customers’ purchase patterns. Privacy preservation throughout these processes plays a crucial role. K-anonymity is a well-known technique for preserving privacy. The principle issues in k-anonymity are information loss and running time. Although some of the existing k-anonymity techniques are able to generate anonymized data with acceptable information loss, their main drawback is that they are very time-consuming and are not applicable in a streaming context since streaming data are usually very sensitive to delay and need to be processed quite fast. In [32], we proposed a cluster-based k-anonymity algorithm called fast anonymizing algorithm for numerical streaming data (FAANST) which can anonymize numerical streaming data quite fast while providing an admissible information loss. The main drawback of FAANST is that some tuples may remain in the system for a long time and are output when they might be considered to have expired. In this paper, we propose two extensions for FAANST, passive and proactive solutions. These two solutions put a soft deadline, called delay, on the time each tuple can stay in the system, and if a tuple passes this deadline, these algorithms force Most of this research was done when the author was a M.Sc student at the University of Western Ontario. H. Zakerzadeh (B) · S. L. Osborn Department of Computer Science, University of Calgary, 2500 University Dr. NW Calgary, Alberta T2N 1N4, Canada e-mail: [email protected] S. L. Osborn e-mail: [email protected]

the tuple to be output. The proactive solution goes even one step further and utilizes a simple heuristic function to predict when a tuple in the system may expire and outputs the tuple if it will expire in the next round of the algorithm’s execution. Keywords K-anonymity · Privacy-preserving data mining · Streaming data

1 Introduction Streaming data are being widely used in today’s world. Data from different sources come in streams and are required to be processed online and with minimum delay. Examples of applications using data streams are financial applications, network monitoring applications, security applications, telecommunications data management applications, web applications, manufacturing applications, and others. Streams from these applications need to be mined in order to obtain important information. As a case in point, online trading companies such as Questrade.com receive and record tens of thousands of online bids issued by their customers every day. Suppose the bids made by bidders constitute a transaction stream

Data Loading...

Delay-sensitive approaches for anonymizing numerical streaming data

Recommend Documents

Numerical approaches

Logging and Monitoring System for Streaming Data

Streaming Data into the Warehouse

Online Multi-objective Subspace Clustering for Streaming Data

Reduction of Data Leakage Using Software Streaming

Streaming IoT Data to Microsoft Azure

Federated Soft Gradient Boosting Machine for Streaming Data

IoT streaming data integration from multiple sources

Data-Driven Statistical Approaches for Omics Data Analysis

SWEclat: a frequent itemset mining algorithm over streaming data using Spark Streaming

Bio-inspired Algorithms for Data Streaming and Visualization, Big Data Management, and Fog Computing

Middleware for Streaming 3D Meshes