Outliers in official statistics
- PDF / 1,750,133 Bytes
- 23 Pages / 439.37 x 666.142 pts Page_size
- 41 Downloads / 215 Views
Theory and Practice of Surveys
Outliers in official statistics Kazumi Wada1 Received: 10 January 2020 / Accepted: 19 September 2020 © The Author(s) 2020
Abstract The purpose of this manuscript is to provide a survey on the important methods addressing outliers while producing official statistics. Outliers are often unavoidable in survey statistics. They may reduce the information of survey datasets and distort estimation on each step of the survey statistics production process. This paper defines outliers to be focused on each production step and introduces practical methods to cope with them. The statistical production process is roughly divided into the following three steps. The first step is data cleaning, and outliers to be focused are that may contain mistakes to be corrected. Robust estimators of a mean vector and covariance matrix are introduced for the purpose. The next step is imputation. Among a variety of imputation methods, regression and ratio imputation are the subjects in this paper. Outliers to be focused on in this step are not erroneous but have extreme values that may distort parameter estimation. Robust estimators that are not affected by remaining outliers are introduced. The final step is estimation and formatting. We have to be careful about outliers that have extreme values with large design weights since they have a considerable influence on the final statistics products. Weight calibration methods controlling the influence are discussed based on the robust weights obtained in the previous imputation step. A few examples of practical application are also provided briefly, although multivariate outlier detection methods introduced in this paper are mostly in the research stage in the field of official statistics. Keywords Outlier detection · Robust estimation · MSD · M-estimators · Weight calibration
* Kazumi Wada [email protected] 1
Statistical Research and Training Institute, Ministry of Internal Affairs and Communications (MIC), 2‑11‑16 Izumi‑cho, Kokubunji‑shi, Tokyo 185‑0024, Japan
13
Vol.:(0123456789)
Japanese Journal of Statistics and Data Science
1 Introduction 1.1 What are outliers Outliers are extreme or atypical values that can reduce and distort the information in a dataset. The problem of how to deal with outliers has long been a concern. Barnett and Lewis (1994, p. 3), one of the pioneering books in mathematical statistics dealing with outlier detection, reference Pierce (1852) published more than 150 years ago. Eliminating outliers from estimation carries the risk of losing information, however including the risks of contamination. To deal with the problem, Barnett and Lewis (1994, p. 3) devised a principle to accommodate outliers using robust methods of inference, allowing for the use of all the data while alleviating the undue influence of outliers. We follow this principle and focus on the robust statistical methods introduced by Huber (1964) that are the most suitable for survey data processing. Therefore, statistical tests are beyond the scope of our d
Data Loading...