Neural Network for Earthquake Prediction Based on Automatic Clustering in Indonesia

A model of artificial neural networks (ANNs) is presented in this paper to predict aftershock during the next five days after an earthquake occurrence in selected cluster of Indonesia with magnitude equal or larger than given threshold. The data were obtained from Indonesian Agency for Meteorological, Climatological and Geophysics (BMKG) and United States Geological Survey’s (USGS). Six cluster was an optimal number of cluster base-on cluster analysis implementing Valley Tracing and Hill Climbing algorithm, while Hierarchical K-means was applied for datasets clustering. A quality evaluation was then conducted to measure the proposed model performance for two different thresholds. The experimental result shows that the model gave better performance for predicting an aftershock occurrence that equal or larger than 6 Richter’s scale magnitudes. Keywords— Artificial neural networks, earthquake prediction, cluster analysis


I. INTRODUCTION
Indonesia has a high risk to geological disasters as it lies between three active plates of the world: the Eurasian, Indo-Australian and Pacific plates.Therefore, a system that is able to predict the earthquake will greatly help to minimize the risk of losses that arise.A series of earthquakes are not randomly formed, but follow a spatial pattern with a trigger that results in an earthquake event.In other words, natural disasters (e.g., earthquakes) rarely appear on their own, but instead they tend to form a complex network of interacting faults as in [1][2][3].However, an earthquake prediction should state where, when, how big, and how probable the predicted event is [4].Several studies applied artificial neural networks (ANN) for earthquake prediction as in [5][6][7].
Seismic clustering is the first stage to analyze earthquake risk which based on a variety of seismology criteria.By using a clustering scheme, it is possible to retrieve spatiotemporal pattern that created by events.Although the automatic identification of the optimal number of clusters on seismic data is a very difficult problem in the process of data clustering [8], but the optimal number of clusters can be determined by measuring variance within and variance between each cluster, this is known as cluster analysis [9][10].The data used are seismic data of all regions of Indonesia obtained from BMKG and USGS in year 1910 to 2017.This data is filtered using magnitude of completeness of 5.1 Richter magnitude scale [11].
This study proposed an ANN model to predict an earthquake during the next five days after an earthquake occurrence with magnitude equal or larger than given threshold for a selected cluster in all region of Indonesia.Aftershock prediction can be used by the authorities to deploy precautionary policies.

II. RESEARCH METHOD
This section describes the system design of earthquake prediction.Figure 1 illustrates the overall process.

A. Data Acquisition
The earthquake data covers the entire territory of Indonesia, which is the boundary 5.98 north latitude -11 south latitude and east longitude 94.12 -140.98 east longitude.The earthquake dataset in this research is taken from catalogue of Agency Meteorology, Climatology and Geophysics (BMKG) and United States Geological Survey (USGS) from 1910 to 2017.The dataset consists of 82,580 seismic events with magnitude 1 -10 ML, and depth of 0-650 km.

INTERNATIONAL JOURNAL ON INFORMATICS VISUALIZATION VOL 2 (2018) NO 1 e-ISSN : 2549-9904 ISSN : 2549-9610
Figure 1 Research approach for earthquake prediction The magnitude of completeness (MC) of the catalogue for earthquake dataset is determined first before implements clustering process.It uses the Gutenberg-Richter frequency size distribution [12].Magnitude of completeness also called as threshold or cut-off magnitude.Seismic events with magnitude under the MC will be eliminated [11].
Incompleteness of seismic data will give the result in seismic risk parameters resulting into overestimated or underestimated.In this research, we use magnitude 5.1 ML for magnitude of completeness (MC) according to [11].So, total number of seismic data after implementing magnitude of completeness is 9,233 events.

B. Cluster Analysis
The main objective of cluster analysis is to generate groups of data from a large dataset in to small group with the same characteristics.Good clustering results will produce an accurate representation of the behaviour of a system [8].This concept is to extract the most similar (or dissimilar) separated sets of objects according to a given similarity (dissimilarity) measure [12].Thus, this algorithm is the basis for extracting the dataset to obtain useful information for finding pattern recognition in the data [8].
Clustering process is divided into 2 schemes.The first process is to find the number of cluster.The number of clusters are the amount of cluster to group large data into small groups.There are many algorithms to determine the number of cluster.The easiest method to find the number of cluster is to generate randomly.But, this method does not guarantee to find the global optimal.Choosing a large number of clusters does not necessarily imply better for clustering.On the contrary, results could be less accurate representation.Therefore, the optimal number of clusters has to be determined.Since the number of clusters selected to cluster dataset is one of the most critical decisions in clustering techniques, several approaches have been developed in order to determine this number [11].In this paper, Valley Tracing and Hill Climbing algorithm are implemented to select the optimal number of clusters [9].It analyzes the moving variance of clusters for each stage of cluster construction and observes the pattern to find the global optimum as well as to avoid the local optima.
The algorithm to find the optimal number of clusters [9] is described as follows: 1. Set as each data of A, where A is attribute of ndimensional vector., and as optimal number of cluster The second process is to cluster the dataset.into a particular cluster based on the most similar.Before running this process, it requires initial cluster.Therefore, the number of clusters obtained from the first process, becomes the initial clusters for clustering dataset.In this paper, data clustering uses Hierarchical K-means clustering.It combines two clustering algorithms: Hierarchical algorithm and Kmeans algorithm [13].
The Hierarchical K-means clustering algorithm is as follows [13]: 1. Set as each data of A, where A is attribute of ndimensional vector.2. Set K as the predefined number of clusters.

C. Proposed Neural Network
The architecture of the neural network model for predicting the earthquake occurrence is shown in Figure 2..The input neuron layer has seven nodes representing the seven seismicity indicators.They are used to recognize the pattern of earthquake occurrences.It can also be employed for predicting the nature of the future events and as well as mitigation of earthquake risk [11], [14].The seven seismicity indicators are shown in Table 1.
To find seismic parameters, firstly, the earthquake data should be cluster into small data groups based on the optimal number of cluster.Then, distribution of data is utilized to investigate the seismic parameters for each clusters, for example, b value.

No.
Notation Description Maximum magnitude during the last week 7 x7i Dynamic GR's law Five input parameters (b-value) were obtained from the Gutenberg-Richter law's.The b parameter characterizes the relative size of distribution of events and it depends on the stress regime and seism tectonic behaviour of the region.In this paper, it is calculated using the last 50 recorded quakes [15].The b parameter value is determined as: (1 Where Mi is the magnitude for the i-th earthquake in the clustering dataset.Mo is magnitude completeness or cut off magnitude.
Then, increments of b are calculated: (2) (5) From these equations, it can be concluded that 70 earthquakes are required to calculate the xi values, therefore, to obtain the five of the ANN inputs.
The sixth input variable x6i is the maximum magnitude Ms from the recorded quakes during the last week in the analyzed area [15].It is defined as: when (7) where the time t is measured in days.
The seventh input variable x7i identifies the probability of recording an earthquake with magnitude larger or equal to Ms.This input is to include Gutenberg-Richter's law in a dynamic way which is calculated from the probability density function (PDF) [15].For example, this study uses magnitude Ms with 6.0 ML so the parameter x7i is defined as: (8) The output parameter has one variable yi.yi parameter is the maximum magnitude Ms observed in the cell under analysis during the next five days.It has been set to 0 for such situations where no earthquake with magnitude equal or greater to Ms.In this scenario, Ms is set to 6.0 ML. (9) where the time t is measured in days.
The training vector associated to the i-th earthquake can be expressed as: (10) The input vector (x) has seven elements, and the output vector (y) has a single element.
The number of hidden layers and the number of nodes in each hidden layer are determined by trial and error for best results.First, one hidden layer and several nodes are arranged .The number of nodes is increased gradually to see the pattern of the movement of the result.When the number of nodes is increased but no improvement in its value, the experiment is stopped and the best result is taken.
The activation function in this paper uses sigmoid activation function.It is defined as: (11) where, (12) where, wij are the connection weights between unit i and unit j.
The learning method in this paper is back propagation.It consists of two main phases: propagation and weight update.

D. Performance evaluation
In this prediction, an earthquake is said to occur during the next five days when its magnitude reaches the specified threshold (Ms).Several parameters used to evaluate the performance of system [2] are: 1. True positives (TP).The number of times that the ANN predicted an earthquake and an earthquake did occur during the next five days.

True negatives (TN).
The number of times that the ANN did not predict an earthquake and no earthquake occurred during the next five days.

False positives (FP) or false alarm. The number of times
that the ANN predicted an earthquake but no earthquake occurred during five days.

False negatives (FN).
The number of times that the ANN did not predict an earthquake but an earthquake did occur during the next five days.8. Specificity (Sp).The proportion of negatives that are correctly identified, calculated as follow: (16)

III. RESULTS AND DISCUSSION
In this section, it is explained the results of research and at the same time is given the comprehensive discussion.

A. Dataset Clustering
Grouping dataset in to small group is well known as clustering.There are three steps in this stage.First step is to get optimal number of cluster, while the second step is data clustering process, and the last step is visualizing on map.
Earthquake dataset in this paper is clustered using epicentre parameter, that consists of longitude and latitude.The optimal number of cluster is obtained by applying Valley Tracing and Hill Climbing algorithm.Six clusters are the optimal number resulted from these processes.It becomes an initial number of clusters for dataset clustering.Afterwards, the second step is data clustering process, it uses Hierarchical K-means clustering algorithm.
To visualize the result of clustering process, it uses PHP/Java Bridge tools and Google map on Indonesia map [16].Every cluster has different colours in order to differ one cluster to another.Visualization of spatial data distribution based on clustering can be seen in Figure 3.A number of dataset of each cluster is shown by Table 2, while the Indonesian territory belonging to each cluster can be seen in Table 3.

B. Seismic Parameters Extraction
The earthquake catalogue used in this paper has been obtained from the BMKG and USGS Services.Before analyzing earthquake parameters, the completeness of the data each area should be analyzed first.Year of completeness of seismic catalogue can be graphically calculated.On the graph, the time is inversely plot to the xaxis and the accumulated number of quakes on the y-axis.
The slope of the curve, which represents the annual rate in the data, has a straight line on the curve showing that at that time span is constant.From Figure 4 4.
The model was tested using earthquake catalogue between June 2008 and December 2016.It also consists of 489 datasets for cluster0, 152 data sets for cluster1, and datasets for cluster2, cluster3, cluster4, cluster5 are 237, 199, 307, 318.Sample of testing datasets for cluster0 showing sevenelement input and one output parameter are presented in Table 5.
The best result according to trial and error is the ANN model with learning rate sets to 0.1, two hidden layers with thirty-two nodes each other and 10000 iterations.7 show the experimental result of testing scenarios of proposed ANN model for quality parameters.In the first scenario used the input node x7i = P(Ms≥5.5)and the second one used the input node x7i = P(Ms≥6).IV.CONCLUSION An ANN model has been built to predict an earthquake during the next five days after an earthquake occurrence in a selected cluster.The best result according to trial and error is the ANN model with learning rate sets to 0.1, two hidden layers with thirty-two nodes each other and 10000 iterations.The proposed model gave better performance for magnitude threshold(Ms) ≥ 6 ML.
V. ACKNOWLEDGEMENT

2 . 6 .
Set K as the predefined number of clusters 3. Apply clustering algorithm with number of clusters 4Repeat from step 3 while j<n-2.7. Calculate the moving variance with valley tracing -hill climbing ∂ 8. Put threshold on moving variance ∂ for automatic clustering 9. Rank the moving variance

3 .
Determine p as numbers of computation 4. Set i=1 as initial counter 5. Apply K-means algorithm.6. Record the centroids of clustering results as 7. Increment i=i+1 8. Repeat from step 5 while i<p.9. Assume as new dataset, with K as predefined clusters 10.Apply hierarchical algorithm 11.Record the centroids of clustering result as 12.Then, as initial cluster centers for Kmeans clustering.

Figure 2
Figure 2 Architecture of the proposed neural network

Figure 3
Figure 3 Magnitude distribution based on cluster from 1910 to 2017 with Magnitude completeness MC 5.1 ML, and total number of earthquake 9,233 events , the earthquake data is obtained in the estimated timeframe from January 1975 to May 2008 for data training.The rest of data between June 2008 to December 2016 is used as sample testing datasets.

Figure 4
Figure 4 Year of completeness for all studied areas C. Neural Network Training and Testing Network training uses earthquake catalogues between January 1975 to May 2008.It consists of 2081 training data sets for cluster0, 369 training datasets for cluster1, and 759, 739, 1009, 1113 training datasets for cluster2, cluster3, cluster4, cluster5.Sample of training datasets for cluster0 are presented in Table4.The model was tested using earthquake catalogue between June 2008 and December 2016.It also consists of 489 datasets for cluster0, 152 data sets for cluster1, and datasets

Table 1
The seven seismicity indicators

Table 2
The amount of data member of clusters

Table 3
Dataset Member of each cluster based on region

Table 4
Sample of training datasets in cluster0

Table 5
Sample of testing datasets in cluster0

Table 6 and
Table

Table 6
ANN performance of proposed model with x7i = P (Ms≥5.5)

Table 7
ANN performance of proposed model with x7i = P (Ms≥6.0)