Spatial Outlier Detection Using GAMs and Geographical Information Systems
A spatial (local) outlier is a value that differs from its neighbors. The usual way in which these are detected is a complicated task, especially if the data refer to many locations. In this paper we propose a different approach to this problem that consi
- PDF / 249,171 Bytes
- 8 Pages / 439.37 x 666.142 pts Page_size
- 93 Downloads / 194 Views
Abstract A spatial (local) outlier is a value that differs from its neighbors. The usual way in which these are detected is a complicated task, especially if the data refer to many locations. In this paper we propose a different approach to this problem that consists in considering outlying slopes in an interpolation map of the observations, as indicators of local outliers. To do this, we transfer geographical properties and tools to this task using a Geographical Information System (GIS) analysis. To start, we use two completely different techniques in the detection of possible spatial outliers: First, using the observations as heights in a map and, secondly, using the residuals of a robust Generalized Additive Model (GAM) fit. With this process we obtain areas of possible spatial outliers (called hotspots) reducing the set of all locations to a small and manageable set of points. Then we compute the probability of such a big slope at each of the hotspots after fitting a classical GAM to the observations. Observations with a very low probability of such slope will finally be labelled as spatial outliers.
1 Introduction. Spatial Outliers A local or spatial outlier [3] or [6] is an observation that differs from its neighbors, i.e., z(s0 ), the value of the variable of interest Z at location s0 , is a local outlier if it differs from z(s0 + Δs0 ) where Δs0 defines a neighborhood of location s0 . The usual method used to detect local outliers is somewhat complicated because, first, we have to define what is a neighborhood, i.e., what is “close”; then, we have to select some locations inside the neighborhood, to compute and compare the value of Z at these locations.
A. García-Pérez (B) Departamento de Estadística, I.O. y C.N., Universidad Nacional de Educación a Distancia (UNED), Paseo Senda del Rey 9, 28040 Madrid, Spain e-mail: [email protected] Y. Cabrero-Ortega C.A. UNED-Madrid, Madrid, Spain e-mail: [email protected] © Springer International Publishing Switzerland 2017 M.B. Ferraro et al. (eds.), Soft Methods for Data Science, Advances in Intelligent Systems and Computing 456, DOI 10.1007/978-3-319-42972-4_31
245
246
A. García-Pérez and Y. Cabrero-Ortega
In the first part of the paper we propose two novel techniques based on a GIS for easily and quickly detect possible local outliers. The first one, developed in Sect. 2, is based on making a geographical map where the heights of the ground correspond to the observations. This map of separate heights is completed by means of a Triangulated Irregular Network (TIN) interpolation. Once the geographical map has been made, local outliers are easily identified as hills with big slopes. The second technique, developed in Sect. 3, consists in fitting a robust GAM to the observations. Then, we do the previous process (interpolation plus detection of outlying slopes) with the residuals of this robust fit. These ideas have been previously used (with some variants) in [5, 10, 12]. Here we extend their ideas considering a more general model, a GAM one, because th
Data Loading...