Combination of Feature Extractions for Classification of Coral Reef Fish Types Using Backpropagation Neural Network

— Feature extraction is important to obtain information in digital images, where feature extraction results are used in the classification process. The success of a study to classify digital images is highly dependent on the selection of the feature extraction method used, from several studies providing a combination of feature extraction solutions to produce a more accurate classification. Classifying the types of marine fish is done by identifying fish based on special characteristics, and it can be through a description of the shape, fish body pattern, color, or other characteristics. This study aimed to classify coral reef fish species based on the characteristics contained in fish images using Backpropagation Neural Network (BPNN) method. Data used in this research was collected directly from Bunaken National Marine Park (BNMP) in Indonesia. The first stage was to extract shape features using the Geometric Invariant Moment (GIM) method, texture features using Gray Level Co-occurrence Matrix (GLCM) method, and color feature extraction using Hue Saturation Value (HSV) method. The third value of feature extraction was used as input for the next stage, namely the classification process using the BPNN method. The test results using 5-fold cross-validation found that the lowest test accuracy was 85%, the highest was 100%, and the average was 96%. This means that the intelligent model derived from the combination of the three feature extraction methods implemented in the BPNN training algorithm is very good for classifying coral reef fish.


I. INTRODUCTION
Bunaken National Marine Park (BNMP) is located in the north Sulawesi Province of Indonesia. BNMP is located in the centre of the world coral triangle [1], so it has high marine biodiversity and has become the home of many marine biotas, including many types of reef fish. Data used in this study is collected directly from the area of BNMP. Fish have different shapes and sizes, which shows specific characteristics in terms of the shape and size of the fish's body [2]. The many differences in characteristics of fish can make it difficult to identify fish species without knowledge of fisheries. Fish recognition is a way of identifying fish based on particular characteristics [3]. It can be through a description of the fish's shape, body pattern, color, or other features. For this reason, it is necessary to have the ability to classify types of marine fish through the help of computers [4], [5].
Artificial Intelligence (AI) has been widely applied to solve a problem that generally requires the reasoning expertise of an expert [6], [7]. One of the AI methods that are widely applied in various fields is Artificial Neural Network (ANN) with a backpropagation training method known as Backpropagation Neural Network (BPNN) [8], [9]. BPNN is an excellent method in the classification process because of its ability to adapt network conditions to the data provided in the learning process. The BPNN method has been applied to predict and select prospective recipients of the Bidikmisi scholarship based on the poverty level with a fairly good system accuracy of 85.6% [10]. A backpropagation algorithm has also been used to build a fish pattern recognition system, and then image segmentation is carried out by relying on color texture measurements [11]. The results showed that it could classify poisonous or non-poisonous fishes. In addition, research on fish classification has also been carried out using an image processing approach and an artificial neural network to classify fish species effectively and efficiently using a Probabilistic Neural Network with an 89.65% classification accuracy [12].
Feature extraction is essential to get the information in the digital image, which is then used in the classification process. Comparison between feature extraction methods Gray Level Co-occurrence Matrix (GLCM), Local Binary Patterns (LBP), Wavelet, Ranklet, Granulometry, and Laws' Masks to evaluate the performance of the classification method with a case study of exotic wood texture classification shows that LBP is more appropriate to analyze wood texture [13]. In terms of performance, the comparison of the performance of the Surf, Harris, Brisk, and Fask feature extraction methods shows that the Surf method is the best in classifying images [14]. Furthermore, from the integration of the GLCM feature extraction method and geometric feature extraction of a region of interest (ROI) for classifying tuna, it was found that the best classification accuracy was 86.76% obtained through the GLCM method [15]. In addition, the use of the HSV color feature extraction method and GLCM texture feature extraction to identify the type of woven fabric shows that the accuracy of the color and texture combination features is 91.67% [16]. Siar and Teshnehlab [17] used a combination of feature extraction to categorize tumor disease from digital images using the CNN method; the results obtained were very accurate, namely 99.7%, and increased when compared to using only one feature extraction method. The combination of feature extraction and selection combined in texture analysis was also carried out by Shang and Li [18], to overcome the problem of image classification. The success of research to classify digital images is very dependent on the selection of the feature extraction method used, from several studies above that the combination of feature extraction produces more accurate research results.
Based on the results of previous studies described above, it can be seen that the feature extract process greatly influences the classification results. The novelty of this research is the combination of feature extraction methods Geometric Invariant Moment (GIM), Gray Level Co-occurrence Matrix (GLCM), and Hue Saturation Value (HSV) for the classification of coral reef fish species using the BPNN method.

II. MATERIAL AND METHODS
The fish images collected were from local coral reef fish in the waters of the Bunaken National Marine Park, North Sulawesi Province, Indonesia. The first stage was carried out through a feature extraction process using the GIM, GLCM, and HSV methods to get the shape, texture, and color feature extraction values. The next stage was to classify using the BPNN method using the input from feature extraction values. Figure 1 shows this study's flow chart of the coral reef fish classification system.

A. Research Data
The dataset in this study was taken from 4 species of coral reef fish, namely Yellowtail snapper (Lutjanus ehrenbergii), Tiggerfish (Odonus niger), Rengginan fish (Myripristis berndti), and Red bigeyebrownspot (Priacanthus tayenus) which were labeled in class I, II, III and respectively. IV. This coral reef fish species dataset was collected from the waters of BNPM, North Sulawesi. Figure 2 shows an example of fish species used as a dataset.  Table I shows 100 imagery data of coral reef fish consisting of 20 data from Lutjanus ehrenbergii and Odonus niger species and 30 data each from Myripristis berndti and Priacanthus tayenus species.

B. Pre-processing
The pre-processing stage was carried out by processing the input image dataset from the fish image dataset. The first process was to crop the fish image to get the complete object from the image. Then the resize process was carried out to

C. Feature Extraction
Feature extraction is the process of taking features of an object that can describe the characteristics of the object [19]. The feature extraction aims to increase the classifier's efficiency by finding the densest and most informative feature set (different patterns) [20]. The feature extraction methods used in this study were GAME, GLCM, and HSV.

1) Geometric Invariant Moment: Geometric Invariant
Moment is a form feature extraction method. The characteristics can be in position, area, and other characteristics. Hu introduced this method in 1962. The invariant moment has properties that are not affected by transformations of translation, dilation, rotation, and even reflection by calculating seven quantities of an object [21]. The process starts by calculating the moment value, followed by calculating the central moment, and this stage produces seven central moment values. Then normalize the central moment value. After normalization, it is continued by calculating the seven invariant moment values. The following equation calculates the central moment (µpq) for an image.
The intensity value is f(i,j), the value of i as row and j as column. For normalization (ղpq) of the central moment is calculated by using the following equation The first moment of invariance is while the 7th-moment Invariant is 3 3 2) Gray Level Co-occurrence Matrix: Gray Level Cooccurrence Matrix (GLCM) is a texture feature extraction method by analyzing the gray level of the pixels of an image. The introduction of texture extraction on the image is carried out based on second-order statistical characteristics. This approach is done by forming a co-occurrence matrix from the image data, followed by calculating the second-order statistical features representing the image into feature vector values. The characteristics used are contrast, correlation, energy, and homogeneity [22]. Texture feature extraction is calculated using equation (5) to equation (8).
Where, i, j are the pixel coordinates in the GLCM matrix. The Levels value is a gray tone range in digital images 0-255 (level=256). Pi,j is the pixel value in the i, j coordinates of the GLCM matrix.

3) Hue Saturation Value:
Hue Saturation Value (HSV) is the extraction of color features in the image. At this stage, each image pixel is recognized in the form of a histogram by quantizing the color histogram of 72 bins. So, after obtaining the HSV matrix, the process carried out is quantizing the color histogram. This process is carried out to improve performance and reduce the computational burden of calculating image pixels [16]. A simple equation to get the HSV value is as follows: Where, H is Hue, S is Saturation, and V is Value. R = red value has not been normalized, G = green value has not been normalized, and B = blue value has not been normalized, indicating the color value ranges from 0-100%. If the value is 0, then the color will be black. The greater the value, the brighter and newer variations of the color will appear.

D. Backpropagation Neural Network
Backpropagation is a supervised artificial neural network training method [23]. It evaluates the error contribution of each neuron after a set of data has been processed. The purpose of backpropagation is to modify the weights, train the neural network and correctly map arbitrary inputs to outputs [24]. The image classification process in this study uses the Backpropagation Neural Network, starting with the input value from the feature extraction. Then the feedforward calculation is carried out to get the output value, then compared with the actual output to get the error value. The error value is propagated to every neuron in the previous layer to update the weight matrix and minimize the error rate. This process continues until the network can produce the expected output or is considered capable of classifying. Fig. 3 The architecture of the BPNN Coral Reef Fish Identification System Fig. 3 shows the BPNN architecture for identifying coral reef fish with 95 neurons in the input layer, 128 and 64 neurons in the two hidden layers, and four neurons in the output layer. The ReLu activation function is used in the hidden layer, while the SoftMax activation function is used in the output layer.

A. Initial Processing
The results of shape feature extraction using the GIM method, texture features using the GLCM method, and color features using the HSV method are then used as input values in the input layer. The number of inputs is 95, consisting of 7 shape feature extraction data, 16 texture feature extraction data, and 72 color feature extraction data. Examples of input values at the input layer for Figure 2 can be seen in Tables I,  II, and III.
The next stage is to convert RGB color to grayscale color mode, simplifying the calculation process on image objects. The results of the RGB to grayscale color conversion will later be included in the GIM feature extraction process and GLCM feature extraction. As for the HSV feature extraction, the results of the RGB to HSV color conversion will be used. Figure 4 shows an RGB color conversion image to grayscale and HSV color modes.

RGB
Grayscale HSV Fig. 4 Example of RGB to Grayscale and HSV Color Conversion Image The process of extracting shape features on fish images using the GIM method allows the recognition of shape features of fish images even though changes are made in the form of translation/shift, dilation/scale change, rotation/rotation, and reflection/mirror. The results of the seven invariant moment values are shown in Table II. The texture feature extraction process using the GLCM method on fish images will be recognized through four feature vectors, correlation, homogeneity, contrast, and energy, with each consisting of four angle orientations, 0º, 45º, 90º, and 135º. The results of texture feature extraction are shown in Table III. In the next process, each pixel of the fish image would be recognized in the form of a histogram by quantizing the image's color in 72 bins. The steps to get the color feature value begin with obtaining the results of the RGB to HSV image conversion in the form of an HSV matrix, then the color histogram quantization process. The matrix of the color histogram quantization results is shown in Table IV.

B. Training and Validation
Before training, the coral reef fish images dataset was divided into 80% for training data and the remaining 20% for testing data. This study conducted the BPNN method as the training algorithm with 80 local coral reef fish datasets as training data. The parameter values of the epoch, batch size, and learning rate was set up to 100, 6, and 0.001, respectively. The training model was carried out by applying 5-fold crossvalidation, which means the data was divided into five subsets, and the training was carried out five times. Figure 5 shows the plotting graphics of training accuracies and loss in 5-fold cross-validation. The training plot for the five folds shows that the accuracy results in each fold started to stabilize in epoch 21 to epoch 100 with an accuracy of 100% and a smaller loss value approaching 0. The next stage is to run the testing process on five models obtained from each fold during the training process. 20 testing data is used in this process, and the performance of each model is observed by using a confusion matrix as shown in Tables V to IX, where true predicted is marked in yellow and false in red color. I to IV shows four classification classes.  I  II  III  IV   Actual  I  5  0  0  0  II  0  5  0  0  III  0  0  7  0  IV  0  0  0  3  Accuracy = 100%   TABLE VI  CONFUSION MATRIX OF FOLD-2   TABLE VII  CONFUSION MATRIX OF FOLD-3   20 data  Predict  I  II  III  IV   Actual  I  4  0  0  0  II  0  5  0  0  III  0  0  5  1  IV  0  0  0  5  Accuracy = 94.99%   20 data  Predict  I  II  III  IV   Actual  I  3  0  0  0  II  0  5  0  0  III  0  0  7  0  IV  0  0  0  5  Accuracy = 100%  TABLE VIII  CONFUSION MATRIX OF FOLD-4   TABLE IX  CONFUSION MATRIX OF FOLD-5   20 data  Predict  I  II  III  IV Actual Confusion matrixes of five models show that one data is false predicted found in fold three while in fold four have 3 data false predicted where all data are truly predicted for fold 1, 3, and 5. The summary of the testing result is shown in table X. The lowest 85.00% accuracy was found in fold four, and the highest 100% was obtained from folds 1, 2, and 5 with an average accuracy of 96.00%. These results show that the combination of GIM, GLCM, and HSV feature extraction for classification using the BPNN method can classify local coral reef fish with very good and stable accuracies above 85%.

IV. CONCLUSION
The extraction of features carried out in this study was to obtain information on fish images, and the results of extracting these features were used in the classification process. The results showed that the combination of extraction of shape, texture, and color characteristics using the GIM, GLCM, and HSV methods, namely the combination of feature extraction, is very influential in classifying reef fish with the BPNN training algorithm. The evaluation results showed the lowest accuracy of 85%, the highest of 100%, and an average of 96%. This indicates that the proposed model could be used to classify reef fish based on digital imagery. So the contribution of this study to researchers should use a combination of feature extraction to produce a more accurate classification. In the future, we will develop an intelligent online system to classify reef fish based on the model proposed in this study that can be used by the community as marine education content, especially in Bunaken National Marine Park of Manado, Indonesia.