Practical Applications in Statistical Disclosure Control Using R
The aim is to show how statistical disclosure methods can be applied to data using the R-packages sdcMicro and sdcTable .
- PDF / 630,218 Bytes
- 32 Pages / 439.37 x 666.142 pts Page_size
- 63 Downloads / 196 Views
Practical Applications in Statistical Disclosure Control Using R Mathias Templ and Bernhard Meindl
Abstract The aim is to show how statistical disclosure methods can be applied to data using the R-packages sdcMicro and sdcTable. The reader of this chapter should be advised how popular methods in microdata protection and tabular protection can be applied within these packages to real-world data. sdcMicro supports an exploratory approach for the anonymization of both categorical key variables and numerical variables. Hereby, global recoding, local suppression, and risk estimation can be applied interactively. Furthermore, various popular methods for microdata protection will be briefly described, but also some new methods for microdata protection and disclosure risk estimation considering reallife data problems will be introduced. Additionally, a description of how tabular protection can be applied using the R-package sdcTable is given. The most challenging part from the user point of view is the preliminary data preparation before tabular protection can be applied. In this case, meta information about the hierarchical variables defining the table must be provided by the user.
3.1 Microdata Protection Using sdcMicro Microdata protection has proved to be extremely popular and has grown extensively in the last few years, because of the significant rise in the demand for scientific-use files among researchers and institutions. The aim and in many cases the legal obligation of data holders which want to disseminate microdata is to provide data for which it may only be possible to identify statistical units by disproportional costs and time resources.
M. Templ (B) Department of Statistics and Probability Theory, Vienna University of Technology, Wiedner Hauptstr. 7, A-1040 Vienna, Austria; Department of Register, Classification and Methodology, Statistics Austria, Guglgasse 13, A-1110 Vienna, Austria e-mail: [email protected]
J. Nin, J. Herranz (eds.), Privacy and Anonymity in Information Management Systems, Advanced Information and Knowledge Processing, C Springer-Verlag London Limited 2010 DOI 10.1007/978-1-84996-238-4_3,
31
32
M. Templ and B. Meindl
The aim of SDC is to reduce the risk of disclosing information on statistical units (individuals, enterprises, organizations) and on the other hand to provide as much information as possible by minimizing the amount of data modification.
3.1.1 Software Issues R-package sdcMicro includes the most popular techniques for microdata protection. It is designed to protect survey data including sampling weights but it can also be applied to data without survey weights (e.g., population data). The underlying code is open source and freely available on the comprehensive R archive network (CRAN, see http://cran.r-project.org). The installation can be easily achieved by typing the following command into R (the text after the # only comments the operation for additional information to the readers) Listing 3.1 Installing the package sdcMicro i n s t a l l . packages ( ‘ sdcMi
Data Loading...