A Clustering Approach for Discovering Intrinsic Clusters in Multivariate Geostatistical Data

Multivariate georeferenced data have become omnipresent in the many scientific fields and pose substantial analysis challenges. One of them is the grouping of data locations into spatially contiguous clusters so that data locations within the same cluster

PDF / 764,042 Bytes
10 Pages / 439.37 x 666.142 pts Page_size
71 Downloads / 238 Views

DOWNLOAD

REPORT

Abstract. Multivariate georeferenced data have become omnipresent in the many scientiﬁc ﬁelds and pose substantial analysis challenges. One of them is the grouping of data locations into spatially contiguous clusters so that data locations within the same cluster are more similar while clusters are diﬀerent from each other, in terms of a concept of dissimilarity. In this work, we develop an agglomerative hierarchical clustering approach that takes into account the spatial dependency between observations. It relies on a dissimilarity matrix built from a non-parametric kernel estimator of the multivariate spatial dependence structure of data. It integrates existing methods to ﬁnd the optimal cluster number. The capability of the proposed approach to provide spatially compact, connected and meaningful clusters is illustrated to the National Geochemical Survey of Australia data.

Keywords: Clustering Non-parametric

1

·

Geostatistics

·

Multivariate

data

·

Introduction

Multivariate data indexed by geographical coordinates have become increasingly frequent in scientiﬁc disciplines and pose real analysis challenges. A classical problem is the clustering of observations into spatially contiguous groups so that observations in the same group are similar to each other and diﬀerent from those in other groups, in some sense. Some typical examples in the geosciences are [16]: (i) deﬁning climate zones; (ii) determining zones of similar land use; (iii) identifying archaeological sites; (iv) delineation of agricultural management areas; (v) establishment of ore typologies. In the non-spatial framework, the problem of clustering observations is wellknown and described in many textbooks from descriptive to theoretical viewpoint. There are two principal clustering approaches namely, hierarchical and partitioning. In the hierarchical approach, a hierarchy of a tree-like structure is constructed using agglomerative or divisive procedures. In the partitioning approach, observations are divided into clusters once the number of clusters to be formed is speciﬁed. Very often, applying on geostatistical data, these non-spatial clustering algorithms have a tendency to produce signiﬁcant spatial scattered c Springer International Publishing Switzerland 2016 P. Perner (Ed.): MLDM 2016, LNAI 9729, pp. 491–500, 2016. DOI: 10.1007/978-3-319-41920-6 39

492

F. Fouedjio

clusters. However, this characteristic is undesirable for many applications (e. g., delineation of agricultural management zones). In the geostatistical framework, a more speciﬁc approach is needed. Geostatistical data often show properties of spatial dependency and heterogeneity, over the region under study. Observations located close to one another in the geographical space might have similar characteristics. In addition, the mean, variance and/ or spatial dependence structure can be diﬀerent from one subregion to another. Hence, the necessity to obtain a close related or contiguous clusters of data locations with similar attribute values. The clustering can be achiev

Data Loading...

A Clustering Approach for Discovering Intrinsic Clusters in Multivariate Geostatistical Data

Recommend Documents

Discovering Laws from Observations: A Data-Driven Approach

Multivariate Predictive Clustering Trees for Classification

Multivariate functional data modeling with time-varying clustering

An Overview of Approaches to the Analysis and Modelling of Multivariate Geostatistical Data

Clustering Brain Signals: a Robust Approach Using Functional Data Ranking

A Computer Vision-Based Approach for Subspace Clustering and Lagrange Multiplier Optimization in High-Dimensional Data

A Geostatistical Approach to Traffic Flow Reconstruction from Sparse Floating-Car Data

Smart Approach for Discovering Gateways in Mobile Ad Hoc Network

Discovering Neutrinos Through Data Analytics

A New Efficient Approach in Clustering Ensembles

A Comparison of Multivariate Time Series Clustering Methods

Multivariate Statistical Analysis A High-Dimensional Approach