Outliers in official statistics

PDF / 1,750,133 Bytes
23 Pages / 439.37 x 666.142 pts Page_size
41 Downloads / 256 Views

Theory and Practice of Surveys

Outliers in official statistics Kazumi Wada1 Received: 10 January 2020 / Accepted: 19 September 2020 © The Author(s) 2020

Abstract The purpose of this manuscript is to provide a survey on the important methods addressing outliers while producing official statistics. Outliers are often unavoidable in survey statistics. They may reduce the information of survey datasets and distort estimation on each step of the survey statistics production process. This paper defines outliers to be focused on each production step and introduces practical methods to cope with them. The statistical production process is roughly divided into the following three steps. The first step is data cleaning, and outliers to be focused are that may contain mistakes to be corrected. Robust estimators of a mean vector and covariance matrix are introduced for the purpose. The next step is imputation. Among a variety of imputation methods, regression and ratio imputation are the subjects in this paper. Outliers to be focused on in this step are not erroneous but have extreme values that may distort parameter estimation. Robust estimators that are not affected by remaining outliers are introduced. The final step is estimation and formatting. We have to be careful about outliers that have extreme values with large design weights since they have a considerable influence on the final statistics products. Weight calibration methods controlling the influence are discussed based on the robust weights obtained in the previous imputation step. A few examples of practical application are also provided briefly, although multivariate outlier detection methods introduced in this paper are mostly in the research stage in the field of official statistics. Keywords Outlier detection · Robust estimation · MSD · M-estimators · Weight calibration

* Kazumi Wada [email protected] 1

Statistical Research and Training Institute, Ministry of Internal Affairs and Communications (MIC), 2‑11‑16 Izumi‑cho, Kokubunji‑shi, Tokyo 185‑0024, Japan

13

Vol.:(0123456789)

Japanese Journal of Statistics and Data Science

1 Introduction 1.1 What are outliers Outliers are extreme or atypical values that can reduce and distort the information in a dataset. The problem of how to deal with outliers has long been a concern. Barnett and Lewis (1994, p. 3), one of the pioneering books in mathematical statistics dealing with outlier detection, reference Pierce (1852) published more than 150 years ago. Eliminating outliers from estimation carries the risk of losing information, however including the risks of contamination. To deal with the problem, Barnett and Lewis (1994, p. 3) devised a principle to accommodate outliers using robust methods of inference, allowing for the use of all the data while alleviating the undue influence of outliers. We follow this principle and focus on the robust statistical methods introduced by Huber (1964) that are the most suitable for survey data processing. Therefore, statistical tests are beyond the scope of our d

Data Loading...

Outliers in official statistics

Recommend Documents

The Challenge of Big Data in Official Statistics in India

Data Quality in Southeast Asia Analysis of Official Statistics and T

Outliers and Influential Observations

Subspace Approximation with Outliers

Spatial Outliers

Official Criminal Careers

The Gaming Outliers

Differentially Private Analysis of Outliers

Maximum Likelihood Clustering with Outliers

Official 3D Spatial Base Data in Germany

Self-Reported Versus Official Offending

Outliers Detection in Multi-label Datasets