Robust Earthquake Cluster Analysis Based on K-Nearest Neighbor Search
- PDF / 2,588,429 Bytes
- 11 Pages / 547.087 x 737.008 pts Page_size
- 14 Downloads / 244 Views
Pure and Applied Geophysics
Robust Earthquake Cluster Analysis Based on K-Nearest Neighbor Search HAMID REZA SAMADI,1
ROOHOLLAH KIMIAEFAR,1,2
Abstract—Grouping of earthquakes into distinct clusters is applied to improve mechanism identification and pattern recognition for active seismicity in a region. One of the important issues concerning earthquake data clustering is determining the optimum number of clusters (ONC) at the early stages of algorithms. In this paper a robust method based on K-nearest neighbor search (KNNS) is presented to achieve three goals: improving output accuracy, improving output stability, and adding the ability to weight the features used in ONC determination. By introducing a new formula, the proposed method utilizes the error calculated for clustered data based on the similarity between the members in each cluster. An outlier attenuation algorithm is also used to improve the performance of the method. Both the Krzanowski–Lai Index (KLI) and the silhouette coefficient (SC), as two conventional methods, were used to compare the results and evaluate the performance. Experiments on synthetic data sets verified the effectiveness of the method, with considerable differences found. The clustering of a real earthquake catalogue related to the seismogenic province of Zagros in Persia using our proposed methodology suggests using 13-cluster analysis for clustering based on the spatiotemporal features with the same weights, and seven-cluster analysis for a case where priority is given only to the spatial parameters of the epicenters. Under the same circumstances, the KLI and SC methods suggest three and 18 clusters, respectively. The results of the experiments on synthetic data sets indicate that the proposed method is quantitatively more stable and more accurate than the other two methods. Keywords: KNN search, earthquake data clustering, number of clusters, outlier data, Zagros.
1. Introduction Data mining generally refers to the extraction of knowledge from available information, in which the purpose is to discover the hidden patterns in a large database. Given recent advances in seismology data
1
Department of Physics, Central Tehran Branch, Islamic Azad University, Tehran, Iran. E-mail: [email protected] 2 Department of Physics, Najafabad Branch, Islamic Azad University, Najafabad, Iran.
and ALIREZA HAJIAN2
analysis, and thus the production of large data sets, the availability of powerful methods able to analyze a large amount of data is essential (Frawley et al. 1991). Clustering, as a method for data mining, is a technique that involves the grouping of observations into a certain number of clusters (Berkhin 2002). In seismology, although clustering is addressed extensively in seismicity analysis and aftershock identification (Zaliapin et al. 2008), there are other diverse uses including event matching of earthquake catalogues with geological evidence (Hall et al. 2018; Ansari et al. 2009), earthquake risk analysis (Mignan et al. 2016; Nazmfar 2019) and earthquake relocation studi
Data Loading...