A Clustering Approach for Discovering Intrinsic Clusters in Multivariate Geostatistical Data

Multivariate georeferenced data have become omnipresent in the many scientific fields and pose substantial analysis challenges. One of them is the grouping of data locations into spatially contiguous clusters so that data locations within the same cluster

  • PDF / 764,042 Bytes
  • 10 Pages / 439.37 x 666.142 pts Page_size
  • 71 Downloads / 201 Views

DOWNLOAD

REPORT


Abstract. Multivariate georeferenced data have become omnipresent in the many scientific fields and pose substantial analysis challenges. One of them is the grouping of data locations into spatially contiguous clusters so that data locations within the same cluster are more similar while clusters are different from each other, in terms of a concept of dissimilarity. In this work, we develop an agglomerative hierarchical clustering approach that takes into account the spatial dependency between observations. It relies on a dissimilarity matrix built from a non-parametric kernel estimator of the multivariate spatial dependence structure of data. It integrates existing methods to find the optimal cluster number. The capability of the proposed approach to provide spatially compact, connected and meaningful clusters is illustrated to the National Geochemical Survey of Australia data.

Keywords: Clustering Non-parametric

1

·

Geostatistics

·

Multivariate

data

·

Introduction

Multivariate data indexed by geographical coordinates have become increasingly frequent in scientific disciplines and pose real analysis challenges. A classical problem is the clustering of observations into spatially contiguous groups so that observations in the same group are similar to each other and different from those in other groups, in some sense. Some typical examples in the geosciences are [16]: (i) defining climate zones; (ii) determining zones of similar land use; (iii) identifying archaeological sites; (iv) delineation of agricultural management areas; (v) establishment of ore typologies. In the non-spatial framework, the problem of clustering observations is wellknown and described in many textbooks from descriptive to theoretical viewpoint. There are two principal clustering approaches namely, hierarchical and partitioning. In the hierarchical approach, a hierarchy of a tree-like structure is constructed using agglomerative or divisive procedures. In the partitioning approach, observations are divided into clusters once the number of clusters to be formed is specified. Very often, applying on geostatistical data, these non-spatial clustering algorithms have a tendency to produce significant spatial scattered c Springer International Publishing Switzerland 2016  P. Perner (Ed.): MLDM 2016, LNAI 9729, pp. 491–500, 2016. DOI: 10.1007/978-3-319-41920-6 39

492

F. Fouedjio

clusters. However, this characteristic is undesirable for many applications (e. g., delineation of agricultural management zones). In the geostatistical framework, a more specific approach is needed. Geostatistical data often show properties of spatial dependency and heterogeneity, over the region under study. Observations located close to one another in the geographical space might have similar characteristics. In addition, the mean, variance and/ or spatial dependence structure can be different from one subregion to another. Hence, the necessity to obtain a close related or contiguous clusters of data locations with similar attribute values. The clustering can be achiev