Automatic Weight of Color, Texture, and Shape Features in Content-Based Image Retrieval Using Artificial Neural Network

—Image retrieval is the process of finding images in the database that are similar to the query image by measuring how close the feature values of the query image are to other images. Image retrieval is currently dominated by approaches that combine several different representations or features. The optimal weight of each feature is needed in combining the image features such as color features, texture features, and shape features. In this study, we use a multi-layer perceptron artificial neural network (MLP) method to obtain feature weights automatically and simultaneously look for optimal weights. The color moment is used to find nine color features, Gray Level Co-occurrence Matrix (GLCM) to find four texture features, and Hu Moment to find seven shape features totaling 20 neurons and all of these features become the input layer in our MLP network. Three neurons in output layers become the automatic weight of each feature. These weights are used to combine each feature's role in obtaining the relevant image. Euclidean Distance is used to measure similarity. The average precision values obtained using automatic feature weights are 93.94% for the synthetic dataset, 91.19% for the Coil-100 dataset, and 54.31% for the Wang dataset. These results have an average difference of 5.06% with the target so automatic feature weighting works well. This value is obtained at a hidden layer size of 11 and a learning rate of 0.1. In addition, the use of automatic feature weighting gives more accurate results compared to manual feature weighting.


I. INTRODUCTION
Content-based image retrieval aims to search images in huge databases based on visual image content efficiently and accurately according to user needs [1].Image retrieval techniques have been widely researched to obtain efficient methods, but currently, there are no universally available techniques for image extraction, indexing, and retrieval.Therefore, this is still a problem that continues to be explored and, at the same time, becomes an area of research that continues to be actively carried out [2].Hameed, Abdulhussain, and Mahmmod [3] conducted a survey, analysis, and compared the latest methodologies in the CBIR field in the 2015-2020 interval, which inspired further research in the CBIR field.
Color, texture, and shape are low-level features that are particularly popular in content-based image retrieval (CBIR) [4].Using visual image information such as color, shape, and texture in image representation and indexing is difficult in finding and retrieving images from database collections [5].Retrieval efficiency is still unable to meet people's needs when only one image feature is retrieved.Zenggang, Zhiwen, and Xiaowen [6] proposed a technique that combines the cumulative histogram method with shape features based on Hu moments.Pradhan et al. [7] proposed a way to overcome the ineffectiveness of visual feature-based CBIR by performing feature extraction and combining ROA and non-ROA features.This work extracts image ROA to get the image's semantic meaning based on multi-directional texture and color features based on spatial correlation [7].Ahmed, Ummesafi, and Iqbal [8] developed a CBIR for object recognition and retrieval by fusion of spatial color features and edge features and then obtained image indexing using bag-of-visual-word and image retrieval with matching schemes.Wang et al. [9] proposed a CBIR for finding the region of the object of concern using the Itti-Koch saliency model.Next, extract the prominent image patterns using a fusion of a multi-feature scheme [9].Multi-feature fusion was used by Dongmei Niu et al. to develop an image retrieval method that looks for color differences by extracting them using local binary patterns (LBP) after examining the direct relationships between shape and texture features and between color and texture features [10].
In his CBIR system, Alsmadi [11] employs descriptors and features of color, shape, and texture.Kayhan and Fekri-Ershad [12] suggested a CBIR system based on a mix of weights between color features using quantization color histograms and texture features using local environmental differences patterns, modified local binary patterns, and GLCM [12].Varish [13] proposed a feature fusion scheme that extracted shape features by calculating moments invariant of multi-resolution-based sub-images at various levels.Texture features were extracted using GLCM, and color features were extracted using the probability histogram model method.Chu and Liu [14] used a multi-integration feature model to perform CBIR by extracting color and edge data and providing a multi-integration feature histogram.Tena et al. [15] conducted a study on the application of feature extraction methods for fabric images, which were divided between conventional techniques and convolutional neural networks.Khan et al. [16] suggested a CBIR approach for image retrieval in a multi-class situation based on a hybrid features descriptor using the genetic algorithm (GA) and SVM classifier.Garg and Dhiman [17]perform the CBIR process by extracting and reducing several features to obtain a multilevel decomposition of the image using the GLCM feature and the LBP variant.Subramanian et al. [18] proposed a way to increase the efficiency and effectiveness of the CBIR system by combining color, greyscale, texture, shape, and random forest classifier features using the Particle Swarm Optimization (PSO) algorithm to select informative features.Jardim et al. [19] developed an image retrieval system for the image property industry by implementing multi-phase deep learning and image processing techniques through image signatures to provide an image representation that is close to precise so that plagiarism can be found from trademarks owned by the industry.
CBIR systems are often composed of low-level features that are constructed by giving all aspects fixed or equal weights.However, based on the image's content, it extracts information about various aspects regardless of their weightage.
In many CBIR systems, the user must manually determine the relevance of low-level characteristics.The weights supplied by the users determine the quality of performance.However, determining the right weights for the attributes is quite difficult if the user is inexperienced.There are two techniques for assigning weights in CBIR: a linear method that describes a linear combination of each feature coefficient in many iterations and a nonlinear method that combines various aspects of the AI method and the Relevance Feedback (RF) method [20].
In this paper, to manually replace the input given by the user, the weighting is automated using the artificial neural network (ANN) method.Modeling an Artificial neural network is based on a human neural network, a network of a group of processing units.ANN can change its structure based on external and internal information to solve problems.The input, process, and output layers comprise the three layers (layers) of ANN.The input layer contains variables for input data, the process layer contains object recognition steps, and the output layer contains object introduction results.
The paper is organized into four sections.The first section is this introduction, containing background and reviews of some related works.The second section will explain the proposed methods to perform automatic weight in image retrieval, including low-level feature methods and artificial neural networks.The third section will present the experimental results and discussion.Finally, the final section will discuss the conclusion and future work.

II. MATERIAL AND METHOD
The proposed method uses a combination of several methods, namely Color Moment, GLCM, and Hu Moment, which are methods for extracting color, texture, and shape features, respectively.In this case, the weighting of the combination of features is done automatically using ANN, and the results will also be compared with the weighting that is done manually.
When a user enters a query image, the system will extract color, texture, and shape features, the methods for which will be discussed in the following section.The feature extraction results are saved in a feature vector normalized, and the weight of each feature is automatically searched using ANN.Euclidean distance is used to calculate similarity, and a result image relevant to the given query image is generated, as shown in Fig 1.

A. Color Moments
The Visual features (color, shape, and texture) and semantic features are two things that are often used in the CBIR method.The visual color feature is one of the most crucial main features [21].The extraction features of the color are easier compared to other features.In addition, color features do not depend on image transformations such as rotation, scaling, vision, and other deformations, and they have strong resilience.The mean, variance, and skewness are the first, second, and third-order color moments that effectively and efficiently represent the distribution of the image color.In image retrieval systems, the color moment is commonly utilized [22] The following are the three color moments: a. Mean color moment: the average value of the image color with pij. is the i-th color channel at the j-th image pixel.
b. Standard Deviation moment: the variance distribution's square root c. Skewness moment: a measure of the distribution's degree of asymmetry.

B. Hu Moments
One of the fundamental characteristics of depicting objects is their shape.The use of shape features will increase the accuracy and efficiency of the image retrieval process [23].Shape characteristics, such as an approximation of objects by a collection of shapes, have been employed in numerous CBIR systems [24].Hu Moments is a form feature search method that consists of seven values calculated using a central moment whose values are not affected by image transformation [25].The first six moment values are unchanged concerning translation, rotation, scale, and reflection.The seventh-moment changes to the reflection of the image.
The Hu moment is calculated using the following formula: where Which is a normalization of the following central moment: . / (12) '̅ and ) * are the centroid coordinates for an area.

C. Grey Level Co-occurrence Matrix
A texture feature is a type of feature that is not based on brightness or color.This feature contains surface information and the environment around the image, and the image's spatial information can be depicted quantitatively [26].The GLCM method is a statistical method for finding image texture features that are calculated using a gray degree distribution.The GLCM approach calculates an image region's contrast, roughness, and granularity based on the adjacent connection between pixels.Image histograms are used for first-order feature extraction, while a co-occurrence matrix is used for second-order feature extraction.The co-occurrence matrix represents the various orientation directions and spatial distances [27].Texture analysis with GLCM has fourteen parameters [28].Huang et al. [29] found that only four unrelated image features could be used to extract the texture features: energy, correlation, contrast, and entropy.a.The second angular moment (energy) shows the uniformity and roughness of the image distribution.
b.The contrast shows image clarity and texture depth.
c.The correlation shows the degree of similarity between components in the row or column direction.Each characteristic index is searched, and its mean and variance are calculated, resulting in a textural feature vector with no relationship to the direction.

D. Multi-Layer Perceptron Neural Network
Artificial neural network engineering has numerous applications in fields such as medicine, science, computing, engineering, nanotechnology, mining, agriculture, environment, climate, business, art, etc. Abiodun et al. [30] explained that when applied to human problems, neural network models such as feed-forward and feedback propagation artificial neural networks perform better.
A common type of ANN model is feedforward neural networks (FFNN), which can view and estimate computational models using parallel layered structures [31].This structure consists of a series of fully connected layers and a set of neurons as processing elements.Multi-Layer Perceptron (MLP) is a subset of FFNN consisting of three parallel layers: input, hidden, and output.Several weights in the range [1,1] should be used to define the joints between layers of MLP.Each node performs two tasks: summarization and activation.The product of the input, weight, and bias are added in Eq (17).
where n represents the number of inputs, , represents the input variable, H represents the bias rate, and G represents the connection weight.Second, start the activation function ( 18) using the output of Eq (17).The S-shaped curved sigmoid function has been the most widely used type in MLP [32].Eq (18) describes this function.
As a result, Eq. ( 19) can be used to calculate the final output of neuron j: The input layer receives features, including color, hu moment, and glcm features.The hidden layer is obtained from a trial-and-error process, and there are three neurons in output layers which will be the weight values of the fusion of the image retrieval model.
The learning steps are carried out to improve and update the network weights based on the ANN structure design.These weights are rationalized to estimate results and minimize errors.

III. RESULT AND DISCUSSION
This section will discuss the suggested system's findings.The suggested system's implementation is written in Python, and the evaluation metric is used to assess the system's performance.Precision is a popular metric for assessing a proposed image retrieval system in which the relevant image serves as the foundation for computations.Precision is defined as the percentage of relevant responses provided by the system among the retrieved occurrences.
P stands for Precision.The relevant images (True Positive) recovered are denoted by TP, while the total number of images retrieved (True Positive + False Positive) is denoted by TP + FP.
AP stands for Average Precision.The mean accuracy of the class's i images (Pi), where n is the total number of images, is used to calculate AP.
MAP represents the Mean Average Precision, PM represents the average precision of the j class image, and m represents the total number of classes in the database.

A. Dataset
The image database in this experiment is separated into three categories of datasets: synthetic, COIL-100, and wang (Table 1).The synthetic dataset has 200 images in four different classes with 64 x 64-pixel sizes.Dataset COIL-100 contains 100 objects with 7,200 images on a black background.Each object is placed on the table and rotated at 5-degree intervals, resulting in 72 images for each object.Wang's dataset contains 1,000 images divided into ten classes containing 100 images, 256x386 and 384x256 pixels in JPEG format consisting of classes Africans, beaches, monuments, buses, dinosaurs, elephants, flowers, horses, mountains, and food.

Data set Images Example synthetic
Coil-100 (Columbia Object Image Library)

B. Image Retrieval Prototype Model Using ANN
Giving weights G , G , and G to the color moment, Hu Moment, and GLCM methods will give the combined similarity value.In this scenario, all feature vector values have been normalized so that the feature values are in the range [0..1].
Distance between two feature images measured using Euclidean Distance for both color moment, hu Moment, and GLCM, namely Weights G , G , G in equation ( 23) can be obtained automatically using the multi-layer perceptron neural network method.Details of the design of the model as shown in Table 2.The input layer receives 20 features, which include 9 color features, 7 hu moment features, and 4 glcm features.The hidden layer is obtained from a trial and error process with the number of neurons being tested between 8 to 13 neurons, meanwhile, there are three output layers which will be the weight values G , G , G .
From table 2, the architecture of the ANN Model can be described as shown in Fig 3.The algorithm to get the automatic weights for image retrieval proposed is as follows (Fig. 4).C. Result Analysis Table 3,4, and 5 compares MAP using automatic weighting and manual weighting to determine the best target MAP value that can be obtained.The MAP Best Target value is the value obtained by selecting the best of 67 alternative weight parameters G , G , G given manually.The weight of G , G , G selected is a combination of fractional values that total 100%, namely pairs (0,0,1), (0,0.1,0.9),(0,0.2,0.8),(0,0.3,0.7),…, (0.8,0,0.2), (0.8,0.1,0.1),(0.8,0.2.0), (0.9,0,0.1), (0.9,0.1,0), (1,0,0).The number of retrieved data consists of n = 10, 20, and 30 images.For the MLP parameter, the value of hidden layer size = 11 and learning rate = 0.1 is chosen, which gives the best MAP value.3, 4, and 5 show that automating feature weights can improve image retrieval precision over manually assigning weights.The total of the combined weights is 1.As a result, weights for a single CM, HM, or GLCM are 1, 0.5, and 0.5 for a combination of two methods, and 0.33, 0.33, and 0.34 for a combination of three methods.Fig 5 depicts a graph of the MAP value of each method.It can be seen that the fusion of 2 or 3 methods will give a better precision value than using only one single method.The image is considered correct if it belongs to the same category as the query.A predetermined (n) number of images are retrieved in ascending order for each query based on the Euclidean distance between the query and the captured images.The top ten images were retrieved using three sample image queries from 3 datasets by applying the proposed image retrieval method with both automatic and manual weighing.Generally, the proposed method with automatic weighting gives more precise results than the single method, a combination of 2 methods, or a combination of 3 methods whose total weight is 1.

IV. CONCLUSIONS
The content-based image retrieval system is proposed using a combination of color, texture, and shape features with automatic weighting using ANN.The results were compared with manual weighing.The feature extraction method used is the color moment, GLCM, and Hu Moment.The initial experimental results show that the fusion of 2 or 3 methods will give a better precision value than using only one single method.Automating feature weights can increase the precision of image retrieval compared to manually assigning weights.To improve retrieval accuracy, future work will involve experimenting with other various methods of color features, texture features, and other shape features.
is used to show the randomness of the image texture features with rotational invariance are created by using offset parameters in four directions (0•, 45•, 90•, 135•).

Fig 2
Fig. 2 MLP Neural Network Fig 2 shows an MLP neural network with one hidden layer.Several weights in the range[1,1] should be used to define the joints between layers of MLP.Each node performs two tasks: summarization and activation.The product of the input, weight, and bias are added in Eq(17).

Fig. 4
Fig. 4 Algorithm of image retrieval using automatic weight

Fig. 6 Fig. 8
Fig. 6 Example of the image retrieval results of the synthetics data set

TABLE II DESIGN
OF PREDICTED MULTILAYER ANN MODEL Take the image based on the similarity index.