Optimization of the Preprocessing Method for Edge Detection on Overlapping Cells at PAP Smear Images

— The complexity of the cell structure and high overlap cause poor image contrast. Complex imaging factors can make automatic visual interpretation more difficult. Segmentation separates a digital image into different parts with homogeneous attributes so that different areas have different features. The challenges faced in performing nucleus segmentation on Pap Smear (PS) images are poor contrast, the presence of neutrophils, and uneven staining of overlapping cells. This research was conducted to improve image quality in identifying the nucleus accurately. The method used is the Polynomial Contrast Enhancement (PCE) model as an approach to preprocessing. This method functions to change the contrast of the Pap smear image against the overlapping cells so that it becomes a significant contrast in detecting the edge of the nucleus object. The detection process uses the Robert and Prewitt edge detection method to test the identification of the nucleus object on 797 PS Repository images of the University of Nusa Mandiri (RepomedUNM). The accuracy result obtained is 86.8%. Comparing Robert's edge detection and Prewitt's edge detection shows that the PCE approach as a filter method can overcome color contrast problems and detect more accurately. The difficulty in detecting the nucleus from the PS image against the overlapping cells can be solved. This method can distinguish overlapping cells from their core during testing, thus becoming a reference in identifying cells with improved accuracy and testing larger data sets.


I. INTRODUCTION
Cervical Cancer (CC) is a very feared cancer [1], [2] and can only be detected through an early Pap Smear (PS) test [3]. PS will produce an image of cells found on the wall of a woman's uterus [4]. From the resulting image, normal and abnormal cells can be found [5], and these abnormal cells are called precancerous cells that can develop into CC [6].
Each cell has only one nucleus or nucleus [7], which regulates all cell activities. Cells that are attacked by cancer cannot work normally [8], and their position overlap [9]. Most cells are relatively thin and lie beneath the surrounding tissue [10], making identification difficult [11]. So it takes a segmentation technique in image processing to assist cell inspection for higher accuracy results [12]. However, there are still weaknesses in the existing techniques, resulting in low accuracy in some cell classes, and some work well on single or multiple cervical smear images [13].
Microscopic image segmentation is still needed to assist pathologists in their diagnostic process. The lower the color discrimination of the image, it will affect the accuracy of this research has been carried out on the recognition accuracy and integrity of the watershed segmentation algorithm [14]. This type of segmentation can recognize and distinguish every cell in a microscopic image, even those in contact [15]. Segmentation is used to detect cytoplasm and nucleus on pap smear images [16], [17] and detect ROI (regions of interest), which is the basis of the automatic cervical cancer screening system. Effective segmentation can facilitate the extraction of meaningful information and simplify image data for further analysis [18], [19]. The results of the nucleus segmentation will be used for further processes such as distinguishing cells, the nucleus, and the background or separating overlapping cells. For that, segmentation results must be accurate. Otherwise, an error will occur in the next process. The first thing that can be done to analyze the image of the overlapping cells is to segment the cervical cell nucleus. The development of an image improvement method is used as a solution to the segmentation problem, which is constrained by the presence of complex color and background differences in addition to the irregularity of the cell shape, especially from overlapping cells, which makes it difficult to carry out the segmentation process on Pap smear images [20]. Settings in ordo to improve image improvement is a method that will be used to modify digital images that aim to be further analyzed so that images are considered significant for vision applications because image perception can be increased according to its capacity [21].
Segmentation separates objects in a digital image into different parts with homogeneous attributes, and different areas have different features [22]. The challenges faced in segmenting are poor contrast, neutrophils, and uneven staining [23] so that each cell appears to have more than one nucleus, as shown in Fig. 1. Fig. 1 The appearance of more than one nucleus in one cell For this reason, a process for nucleus edge detection is needed so that each cell can be identified properly. Annotators often annotate points and connect them to the object frame for edge detection and identification difficulties to improve accuracy [24]. The mean identification error of object area has a 3.4% rate of 10 data sets, so it significantly negatively impacts benchmark stability and model evaluation accuracy [25]. This error rate can affect the model application and selection [26], [27] and accuracy in identification.
Edge detection is a challenging task to identify objects in an image [28]. Generally, edge detection techniques are based on gradients or image derivatives, so it is very important that edge detection in noisy images can be identified correctly against the desired region within the image boundary or contour [29]. Edge detection that gives good results requires procedures that focus on image quality settings [30]. The procedure performed before edge detection is to overcome the uneven [31], [32] contrast in the Pap smear image during acquisition.
The polynomial model was used in this study to effectively detect and separate the nucleus area on PS images, which could potentially aid early detection of cervical cancer and reduce the need for invasive interventions. Nonetheless, some issues need to be resolved in ordo to use this technique more effectively. One of the main issues is the challenge of selecting the appropriate feature extraction and polynomial model parameters for various PS images. In addition, nuclei can differ in size and shape, necessitating adaptation of the polynomial model to account for these variations.
The development of the segmentation method in improving the PS image by combining several edge detection operators achieves an accuracy of 82.9% [33]. Research is in the development of models to effectively identify nucleus areas of overlapping cells in PS to facilitate the subsequent screening process [19]. Based on this fact, this research builds a Polynomial Contrast Enhancement (PCE) model in image improvement so that the nucleus can be detected. This model compares Robert and Prewitt's edge detection technique combined with Imsharp so that the resulting image is sharper and more accurate.

II. MATERIALS AND METHOD
The method of this research phase is presented in Fig. 2. The final goal of the process for each stage is to segment the detected area in each nucleus with the PCE approach as preprocessing. The PCE process is used to improve the contrast of PS images that have overlapping cells into significant contrast so that it is easier to detect the edge of the nucleus.

A. Original Image
The original image used is a data set from the public data RepomedUNM [34]. The image is the result of the cell image on the PS slide of the Special Laboratory of Veterans Pathology Bandung. Images were acquired using a Logitech High-Density Webcam C525 mounted on an Olympus CH20 and Olympus CX21 microscope. Magnification is used 40x (forty times) and saved in JPEG format. The type of test image in the form of normal thinprep amounting to 252 images with a total of 797 nuclei.

B. Pre-processing
The preprocessing stage with PCE focuses on significant color (contrast) differences. The difference is the color will determine the cell nucleus can be identified. The coloring of the cytoplasm and background is given a lighter color so that it is more dominant in yellow and the nucleus in red based on the dominant color in the RGB image structure. The method in this study modifies the PS RGB image pixel value with a polynomial value that functions as a filter on the image. The results of this filter get the best performance. Therefore, it is advisable to test several degrees of the polynomial and choose the one that produces the most satisfactory result. In summary, performing color contrast analysis using polynomials can yield effective results in increasing color contrast in images, but care must be taken to avoid unwanted side effects.
Establishing the best value is very important in influencing the accuracy value. The layer values used are degrees 5 with values 0, 1, 2, 3, and 4. These values are obtained from the results of experiments that were carried out without changing the pixel position (rows and columns) but only changing the color.
In the preprocessing stage, initialization is done by determining the number of layers and the number of samples in finding row positions in polynomials using Equation (1) and (2).
Where is the number of terms, is the row position, is the number of samples, and is the number of terms.
The matrix on the polynomial for color contrast consists of the polynomial coefficients used to calculate the new pixel value in the image after transformation [35]. Each pixel value in the image is changed using a certain ordo polynomial defined by the matrix. Therefore, matrices on polynomials can produce complex color value transformations in the image.
The matrix will be initially formed from the calculation result of pk = [1, 2, 3, 4, 5] and then filled with the values from the calculation result of Equation (2), and the process will be repeated five times according to the number of layers. The results of this matrix are presented in Fig. 3. Fig. 3 The result of the calculation of the 5 x 5 matrix A very significant contrast difference is noticed because this process can identify the cell nucleus. The background color is changed to a lighter color based on the RGB color structure. Contrast changes that occur from test values mapped to a matrix of ordo 5 x 5. The matrix consists of layers or layers arranged according to the layer number, with the number of layers equal to 5.
Next, initialization of the components of each channel, namely red, green, and blue, is carried out with the test in contrast improvement. So that we get the Basel matrix of each component of the new channel defined by rgbx. This new channel forms a new image with a coefficient of 4.5 which is converted to an integer.  Table 1.
The results of matrix formation are then searched for rx, gx and bx values. This value will be switched to change the degree/power. The resulting matrix will be squared. Followed by the addition (sum of squares) by multiplying the initial matrix ( ). This result is reversed into a basel number using equations (8), (9), and (10).
The result of this Basel number is multiplied by the initial matrix. Pre-arranged red, green, and blue color components (rb, gb, and bb) to get rx, gx, and bx components (5 rows and 1 column) with Equation (11), Equation (12), and Equation (13). Where # is the image, whose contrast will be corrected for the components of each channel with a fixed polynomial coefficient with a value of $ = 4.5 which is determined by experiment, where " is the input image conversion to type uint8.
The resulting component values are shown in Table 2 and Table 3. The test results to get the best image value for each RGB value can be determined against a layer with a degree of 5. The format of this step gives the value d = 4.5000, g = 199 x 297 x 3 double and w = 199 x 297 x 3 double, where the input value is g the result is equal to the value of w. The results of the RGB images that have been preprocessed are presented in Fig. 4, where the cell nucleus has a very significant color with a dominant yellow color contrast of the cytoplasm and red dominant nucleus. After modifying the contrast setting, it can be observed that only the nucleus appears in the simulation image [36]. Objects formed into the bounding box of each identified nucleus using Robert's edge detection (DT Robert) and Prewitt's edge detection (DT Prewitt) from comparing images before and after the pixel value repair process are presented in Fig. 5. The metric values can offer a summary of the effectiveness of the Prewitt and Robert operators in detecting edges in a particular image. Analyzing these metrics can help us identify the superior operator for edge detection in the given image. Nevertheless, it is crucial to bear in mind that these assessment outcomes are limited to the tested images and may not be relevant to other images with distinct features. Thus, it is crucial to constantly verify the edge detection operators' performance on different types of images. Based on the resulting image in Fig. 5, it is seen that repair can identify the nucleus better.

C. Edge detection
The edge detection method used consists of two, namely Robert and Prewitt. The results of these two methods are compared to determine the best accuracy [37]. These two methods have different operators and do not always give the same results for each type of image, including Pap smear images. However, the two-edge detection analysis results for Pap smear images will depend on several factors, such as image quality, resolution, and brightness level. In addition, there is also a subjective factor in assessing the edge assessment results because the interpretation of the edge may vary depending on the application's needs. The operator values are presented in Figure 6.

D. Segmented Image
When segmenting images using polynomials, it is crucial to carefully choose the relevant features for extraction and determine the polynomial model parameters most fitting for the specific PS image being segmented. The input image at this segmentation stage is a color image resulting from the preprocessing process. Objects will be separated based on the components of the color Red, Green, and Blue [38]. The grayscale value is obtained by forming the sum of the components R, G, and B weights using Equation (14).
Where " is the pixel value of the grayscale image, 3 is the pixel value of the red color, 6 is the pixel value of the green color, and 8 is the value of the blue color pixel. Furthermore, the convolution, thresholding, morphology, and bounding box process is carried out. Convolution is the adjustment of pixel values to the kernel of each operator Robert and Prewitt. In comparison, thresholding functions to separate the pixels of the nucleus object from the background based on the lighting level or brightness.
To remove noise or pixels that are not a nucleus, they are removed from the image by the Morphology process. After identifying the nucleus area accurately, a bounding box process is carried out for each existing nucleus. From the bounding box, the area of each object can be identified.

III. RESULTS AND DISCUSSION
The overall result of the process in research in detecting the nucleus. The results of the PCE model provide nucleus segmentation results in the PS test image on overlapping without involving the user in identification. This method works with non-learning. The recapitulation of performance calculations is presented in Table 4.  Table 5. We can analyze the results of edge detection using Prewitt and Robert operators by computing the number of TP, FP, FN, and TN from edge detection on the image used, based on the confusion matrix. The stages of the test results on the display of the tested image are presented in Table 6. In the process of determining the nucleus that was successfully detected, all cell nuclei were assigned a 100% determination. Calculation of accuracy using Equation (15). The performance results obtained were evaluated using the Precision and Recall Methods using Equation (16) and Equation (17).
The results of the calculation of the performance evaluation value with the coefficients are presented in Table 7. The result of accuracy in detecting the nucleus with the Polynomial model performed with Robert's edge detection is 86.8% while for Prewitt's edge detection is 85.5%. The results of this evaluation show that the collaboration between the Polynomial model and Robert's edge detection has a better accuracy rate than the Prewitt edge detection.

IV. CONCLUSIONS
This study used PS images from RepomedUNM public data with the type of test image in the form of normal thinprep totaling 252 images with a total of 797 nuclei. The results of using polynomials from this study are for the detection of nuclei in PS images which have significant potential to increase the speed and accuracy of the image segmentation process. A comparison of Robert's edge detection and Prewitt's edge detection in this study shows that the Polynomial Contrast Enhancement (PCE) approach as a filter method can overcome the problem of color contrast and can detect it more accurately. The resulting accuracy from Prewitt's edge detection is 85.5% lower than Robert's edge detection, which is 86.8%. This shows that Prewitt's edge detection tends to be less sensitive to noise in Image PS. The difficulty of detecting the nucleus from PS images is the basis for solving problems for further research on detecting objects from complex and diverse backgrounds. Choosing the right edge detection technique for a particular application and optimizing the technique parameters to achieve high edge detection accuracy is important. Future research can focus on developing more sophisticated image segmentation techniques to improve nuclear detection in Pap smear images to overcome this challenge. This research has the potential to develop a more specific model to identify variations in the size and shape of the nucleus in PS images.