Corrugation and Squat Classification and Detection with VGG16 and YOLOv5 Neural Network Models

—Railway track defects in Malaysia pose significant risks of train derailments and accidents, underscoring the urgency for early and accurate defect detection and classification. This study presents a novel approach utilizing deep learning models, VGG16 and YOLOv5, for detecting and classifying railway track defects, explicitly focusing on corrugation and squat defects. The research's uniqueness lies in its application of these specific models and the composition of a dataset collected from extensive field measurements and inspections across various railway tracks within the Track Network Maintenance Ampang Line in Malaysia. The results demonstrate that these models achieve high precision in defect classification and detection of defects by more than 80%. The proposed methodology provides the railway industry with a powerful tool to streamline maintenance planning and prioritize defect remediation efficiently. Early defect detection can prevent potential accidents and improve safety and operational efficiency. Future studies can expand on these findings by exploring the extension of the proposed techniques to address other types of rail defects. Incorporating a diverse range of scenarios and operating conditions in the dataset could further enhance the models' performance and generalization.


I. INTRODUCTION
Advancement in autonomous and engineering technology is crucial in detecting and handling defects in railway tracks, vehicles, and other infrastructure [1]- [5].These defects can lead to incidents, delays, and other problems, so it is essential to have effective frameworks in place to identify and repair defects as soon as possible.In Malaysia, the problem with railway defect detection is that it is a time-consuming and labor-intensive process that requires skilled personnel to inspect each section of the track [1] manually, [2].This makes detecting and repairing defects promptly tricky, leading to costly service disruptions and potential safety hazards.
To address this issue, Malaysia's railway operators are looking for innovative methods to detect rail surface defects.
One research study investigated the potential of rolling noise to detect rail surface defects on wayside rail tracks in Malaysia.The study conducted field investigations and measured the noise characteristics response under different rail conditions, both with and without defects.The results showed that rail conditions affect the peak frequency and noise amplitude, with surface defects exhibiting a higher peak frequency than defect-free surfaces [1].
While the most common track defects in Malaysia were not explicitly mentioned in [1], it is essential to note that rail surface defects such as corrugation and squats are examples of railway defects that can compromise the safety or reliability of the system if not detected and repaired in time.Various non-destructive testing (NDT) techniques and technologies, including ultrasonic testing, electromagnetic testing, eddy current testing, and visual inspection, have been explored for rail defect detection [3], [6].These methods enable the detection of different defects on the surface and within the rail structure [7].
Visual inspection can be used to focus further testing by other methods and is frequently used as an initial screening approach to discover, for example, cracks, corrugations, and squat deficiencies [8].Two categories of railway defects are shown in Fig. 1: (a) corrugation and (b) squat.It used to only apply to human-accessible portions of structures and is now restricted to surface-breaking flaws.On the other hand, the prospective scope of visual examination [8], [9], and categorization using machine learning approaches such as decision trees [8] has been substantially expanded by recent endoscope improvements.This paper conducted a comprehensive analysis to evaluate the reliability of state-of-the-art neural network models for classifying railway defects.The literature review in Section I presents an in-depth examination of previous work in railway defect detection systems.Section II outlines the methodology employed in this study, including the research design, data collection, and analysis techniques used.The results and analysis of the models are presented in Section III, followed by a discussion of the findings.Finally, Section IV presents the conclusion of this study and provides recommendations for future research in this area.

A. Advances In Non-Destructive Testing (NDT) Techniques for Rail Surface Defect Detection
There has been a growing interest in advanced nondestructive testing (NDT) techniques for detecting rail surface defects in recent years.The traditional manual inspection method has proven time-consuming, labor-intensive, and hazardous for railway workers.Moreover, the subjective nature of human detection makes it challenging to obtain objective results and identify defective locations accurately.These limitations contribute to the inefficiency, inconvenience, and potential for human error in the current detection method.
Researchers have explored various non-destructive testing techniques that leverage sensor information and intelligent algorithms to overcome these challenges.These advanced techniques aim to improve the efficiency and accuracy of rail surface defect detection.Among the widely employed NDT techniques in the railway industry, notable methods include ultrasonic inspection [10], [11], electromagnetic-induced thermoacoustic inspection [12], eddy current inspection [13], and visual inspection [8], [14].Fig. 2 shows the present technology utilized for detecting rail surface defects using (a) ultrasonic inspection and (b) machine visual inspection.
Ultrasonic inspection, as shown in Fig. 2(a), is a technique commonly used to detect rail surface defects.This method utilizes ultrasonic waves to inspect the rail's internal and external conditions.Ultrasonic waves are introduced into the rail, and their behavior is monitored to identify any irregularities or defects in the rail's structure.Despite its effectiveness, ultrasonic inspection has limitations, such as a low signal-to-noise ratio, which can lead to missed or misclassified defects.Additionally, the complexity and variability of rail surface conditions can make it challenging to identify and classify defects [10] accurately.Fig. 2(b) shows the machine visual inspection that involves the use of computer vision and imaging systems to analyze the rail's surface and detect defects visually [15]- [18].This method relies on cameras and image processing algorithms to capture images of the rail surface and identify irregularities that may indicate defects.The development of information technology, communication technology, and sensor technology has led to the evolution of rail health monitoring technology.
(a) (b) Fig. 2 The present technology utilized for the detection of rail surface defects are: (a) ultrasonic inspection [3]; and (b) machine visual inspection [19] These technologies, which often incorporate machine visual inspection techniques, have become significant and challenging because they offer real-time detection capabilities and can provide risk warning forecasts.Railway inspections have been enhanced with the ability to visually identify rail defects effortlessly.This initiative, begun by this author [8], recognized the potential of vision inspection combined with eddy current inspection, also known as a manual visual check.Their proposal involved replacing the manual visual check technique with an automatic visual inspection using the Spectral Image Differencing Procedure (SIDP) and a decision tree method.During the one-week experimentation period, 1024 rails (60 to 120 m in length) were used.Surface rail testing was done using three methods: automated visual testing (VT), manual visual testing (MVT), and eddy current testing (ET).According to [8], approximately 4.1% of the rails had surface defects; among them, 4.8% were exclusively detected by VT, while 2.4% were only found by the old method.The result shows that the VT method is more accurate in detecting surface rail defects than MVT and ET.VT detected damaged rails at a rate of 4.8%, demonstrating that it is more sensitive and capable of detecting faults that other approaches may ignore.However, only 2.4% of the flaws were exclusively discovered using the outdated approach, suggesting that VT is generally superior to the outdated method for identifying surface rail defects.Research on automatic visual examination has been heavily invested in.
Researchers are looking for new ways to enhance rail surface defect detection accuracy and efficiency through NDT techniques.Advances in intelligent algorithms such as deep learning [15], [16] and image processing [19] have created opportunities for automated classification and detection of rail surface defects.When these advanced techniques are used to find defects, they could make the process more accurate, efficient, and objective, improving the safety and reliability of railway transportation systems.
In pursuing enhancing railway maintenance and addressing rail surface defects, the journey began with a captivating proposal found in [20].The study introduced a computer vision-based method that utilized feature descriptors and a support vector machine (SVM) to classify rail corrugation, achieving an impressive 97.6% accuracy in identifying 5645 corrugated rail samples.This success sparked further exploration into the enigmatic world of railway defects, leading to the discovery of the Residual Network 50 (ResNet50) technique with transfer learning [21].This powerful approach allowed deep learning models to draw knowledge from related domains, resulting in an 88.6% classification accuracy with a dataset of 25,000 loose ballast samples.However, challenges arose due to the uncontrolled, real-world data collected from diverse continents and weather conditions.The journey then unveiled the fascinating realm of fastener defects in [22], where the Faster Region-based Convolutional Neural Network (Faster-RCNN) with a Bags of Visual Words model (BOVW) achieved a remarkable 97.9% accuracy using 3155 fastener images.Additionally, the captivating work of [23] showcased the application of YOLOv3 in detecting rail surface defects with an astonishing 99% accuracy using 195 sample images.As the exploration unfolded, an uncharted territory emerged: the mysterious squat defect.A significant research gap was revealed, inspiring the pursuit of effective methods for identifying and predicting squat defects on railway tracks, thereby improving railway safety and efficiency.The adventure continues armed with newfound knowledge and determination to unlock the secrets of railway defects, ensuring safer and more reliable rail transportation for the world.
The integration of computer vision technology has shown promising results in detecting rail defects, including corrugation and squat [24].However, fewer reports on rail corrugation and squat identification are based on automatic visual inspection [24].Some existing methods, such as the one presented in [20], have shown potential for classifying and detecting rail corrugation using computer vision algorithms.This study aims to analyze the effectiveness of computer vision and deep learning techniques, specifically in detecting and classifying two critical types of railway track defects: corrugation and squat, which can cause significant damage and disrupt train operations [25], [26].The research will utilize existing computer vision and deep learning models to investigate the classification and detection of these defects.While integrating computer vision and machine learning techniques has advanced rail inspection, improvements are still needed in identifying and analyzing rail corrugation and squat, which are vital for rail safety and maintenance.By refining these techniques, rail inspection capabilities can be enhanced, ensuring the safety and reliability of railway transportation.These studies and methods will serve as benchmarks for future proposed classification techniques in the rail industry.

II. MATERIAL AND METHODS
This section presents a novel approach to studying railway corrugation and squat defect classification.The research utilizes a unique setup and configuration, employing VGG16 for classification and YOLOv5 for detection.This research holds significance as it showcases the state-of-the-art model's performance in classifying and detecting these defects using a customized dataset.While previous studies [24] and [23] utilized a YOLO family model, our approach differs in model architecture and dataset composition.This distinction is crucial, considering that while the defects remain the same, the view angles may vary compared to other rail defect datasets.

A. Experimental Setup
A specialist conducted extensive field measurements and inspections across various railway tracks within the Track Network Maintenance Ampang Line in Malaysia to acquire a comprehensive and inclusive dataset.This effort aimed to ensure that the dataset encompassed a diverse range of samples, accurately representing the conditions observed in the railway system.Specialized measurement equipment and sensors were employed to capture high-resolution images of the tracks, focusing on areas prone to corrugation and squat defects.Expert specialists thoroughly tagged and annotated the acquired images to aid in later training and evaluation operations.The dataset contained 5778 images, with 2907 images representing corrugation defects and 2871 representing squat defects.The dataset was divided into training (70%), validation (20%), and testing (10%) [27].The training has been executed in Google Colaboratory, a cloud-based integrated development environment (IDE) provided by Google [28].The runtime has been set to a Graphic Processing Unit (GPU) with 12 GB RAM and disk memory of 78 GB.
As a preliminary step before model training, the collected images underwent pre-processing procedures to improve their quality and enable seamless integration with the VGG16 and YOLOv5 architectures.
Pre-processing techniques encompassed resizing the images to a standardized resolution of 224 × 224, normalizing pixel values, and augmenting the dataset through transformations like rotation, scaling, and flipping.These pre-processing steps were implemented to enhance the model's performance and facilitate its generalization ability.
For the classification of railway corrugation and squat defects, the renowned VGG16 model was chosen due to its efficacy in image classification tasks [29].The model underwent training using the pre-processed dataset, incorporating transfer learning techniques.To initialize the weights of the VGG16 model, pre-trained weights from a large-scale image dataset were utilized.The sentence is straightforward but can be improved for better readability and clarity.Subsequently, the railway-specific dataset was optimized to adapt the weight and the model for the task of defect classification for Corrugation and squat.This optimization is illustrated starting in line 7 of the pseudocode for the VGG16 model.Pseudocode for VGG16 model training: The YOLOv5 model, recognized for its real-time object detection capabilities, was employed to detect defects accurately.The model was trained to precisely locate and classify corrugation and squat defects within the railway images.The YOLOv5 model, like the VGG16 model, underwent transfer learning.Pre-trained weights were initialized, and the model was subsequently optimized using the annotated dataset of corrugation and squat, as depicted in line 7 of the YOLOv5 model's pseudocode.Pseudocode for YOLOv5 model training: The optimization criteria include minimizing the categorical cross-entropy loss using the Adam optimizer.A learning rate schedule, controlled by the ReduceLROnPlateau callback, dynamically adjusts the learning rate during training based on the validation loss.Fine-tuning involves selectively unfreezing specific layers to adapt the pre-trained model for the railway-specific task.
A rigorous assessment using various metrics was conducted to comprehensively evaluate the trained models' performance.For the VGG16 model, key metrics such as classification accuracy, precision, recall, and F1-score were computed.These metrics provided insights into the model's ability to classify different defect categories accurately.A confusion matrix was also generated to analyze the model's performance further and identify misclassifications.Pseudocode for VGG16 model evaluation: 1 Calculate and store 2 The loss and accuracy of the model on the training data (Train evaluation) 3 The loss and accuracy of the model on the validation data (Validation evaluation) 4 The loss and accuracy of the model on the test data (Test evaluation) 7 Generate a classification report with precision, recall, F1score, and support for each class (Classification report) 8 Make predictions on the test data using the model (Predictions) 9 Extract the predicted labels by choosing the class with the highest probability 10 Obtain the true labels from the test data 11 Generate a classification report using the true labels and predicted labels 12 Create a confusion matrix to visualize the performance of the model across different classes (Confusion matrix) 13 Generate the confusion matrix using the true labels and predicted labels The YOLOv5 model's detection-specific metrics were employed to evaluate its defect localization and classification accuracy.Average precision, mean average precision (mAP), and intersection over union (IoU) were utilized to quantify the model's effectiveness in accurately detecting and precisely localizing defects within the railway images.This experimental setup advanced comprehension and capabilities in railway defect classification and detection.The integration of VGG16 for classification and YOLOv5 for detection offered a robust methodology to tackle the challenges associated with railway corrugation and squat defects.Pseudocode for YOLOv5 model evaluation: 1

Calculate the Mean Average Precision (mAP) by evaluating the model on the test data 2
Store the evaluation results in the variable results 3 Retrieve the mAP value from the results using the key 'mAP' 4 Create a confusion matrix to evaluate the model's performance 5 Make predictions using the model on the test data 6 Extract: the predicted labels from the predictions 7 The true labels from the test data 8 Generate the confusion matrix using the true labels and predicted labels

III. RESULTS AND DISCUSSION
The study's outcomes using VGG16 for classification indicate high accuracy and performance in distinguishing between railway corrugation and squat defects.Fig. 3 shows samples of predicted test images for (a) classification using VGG16 and (b) detection using YOLOv5.Table 1 presents the evaluation results of the VGG16 and YOLOv5 models for railway corrugation and squat defects.The results showed that both models achieved high levels of accuracy and performance.VGG16 achieved a precision of 0.9 and 0.87 for corrugation and squat defects, respectively.The recall values were 0.86 and 0.91, resulting in F1 Scores of 0.88 and 0.89, respectively.These scores demonstrate the ability of VGG16 to achieve a balanced trade-off between precision and recall.The manuscript presents evaluation results for two different models, VGG16 and YOLOv5, used to detect two critical railway track defects: corrugation and squat.Table 1 summarizes the performance metrics for each model in these defective classes.The VGG16 model achieved precision values of 0.9 for corrugation and 0.87 for squat.On the other hand, the YOLOv5 model exhibited precision values of 0.84 for corrugation and an impressive 0.95 for squat.Additionally, mAP@0.5 scores of 0.93 for corrugation and 0.96 for squat were achieved.The models showed competitive performance in detecting rail defects, demonstrating the potential of computer vision and deep learning techniques for enhancing railway safety and maintenance.
Precision and recall are often used to assess a model's performance in classification tasks.Recall is the percentage of correctly predicted positive instances out of all real positive instances.In contrast, precision is the percentage of correctly predicted positive instances out of all cases projected as positive.The recall values of 0.86 and 0.91 for VGG16 in distinguishing between corrugation and squat defects indicate that the model successfully identified a high percentage of the actual positive instances for both classes.This means that VGG16 has a relatively low false-negative rate, implying that it rarely misses positive instances of either corrugation or squat defects.This can be seen in Fig. 4 (a), which shows corrugation has a 0.09 probability of being classified as squat, while squat has a 0.11 probability of being classified as corrugation.The F1-score is a metric that combines precision and recall into a single value and measures a model's overall accuracy.It is beneficial when there is an imbalance between the number of positive and negative instances in the dataset.The F1-score is calculated as the harmonic mean of precision and recall, giving equal weight to both metrics.In this case, the F1-scores of 0.88 and 0.89 for VGG16 indicate a balanced trade-off between precision and recall for both corrugation and squat classes.These scores suggest that VGG16 achieved a good balance between correctly identifying positive instances (precision) and capturing a high percentage of actual positive instances (recall).It demonstrates that VGG16 effectively achieves high accuracy and high recall simultaneously, which is crucial for accurate defect classification.
For YOLOv5, the precision and recall for corrugation were 0.84 and 0.94, respectively, with an F1-score of 0.88.The mAP at IoU threshold 0.5 (mAP@0.5)for corrugation was 0.93, indicating its capability to localize the defect accurately.In object detection tasks, mAP is a widely used evaluation metric to assess a model's performance.It measures how well a model localizes and classifies objects in an image, considering different thresholds of Intersection over Union (IoU) between predicted and ground truth bounding boxes.mAP@0.5:0.95 is a variation of mAP that calculates the average precision across a broader range of IoU thresholds, specifically from 0.5 to 0.95 with an increment of 0.05.This provides a more comprehensive evaluation of the model's performance across various levels of overlap between predicted and ground-truth bounding boxes.The mAP@0.5:0.95value of 0.87 for YOLOv5 demonstrates the robustness of the model in accurately localizing and classifying objects.This indicates that YOLOv5 consistently performs well across a range of IoU thresholds, from moderate to high overlaps.A higher mAP@0.5:0.95value suggests that the model maintains its accuracy and effectiveness in detecting objects even with stricter criteria for matching predicted and ground truth bounding boxes.Furthermore, in the squat class, YOLOv5 achieved a precision of 0.95 and a recall of 0.79.A precision of 0.95 indicates that most of the predicted bounding boxes for squat defects are accurate.A recall of 0.79 suggests that YOLOv5 successfully identified 79% of the actual squat defects in the dataset.
The F1-score of 0.88 for the squat class is a harmonic means of precision and recall.It provides a single metric summarizing the model's accuracy in identifying squat defects, considering precision and recall values.An F1-score of 0.88 suggests a good balance between precision and recall, indicating that YOLOv5 achieves accurate squat defect detection while minimizing false positives and false negatives.These metrics demonstrate the model's capability to detect squat defects effectively and consistently with high precision and provide reliable localization results.In line with this, it is evident from Figure 4 (b) that the classification performance is favorable, as the chances of misclassifying corrugation as squat are only 0.04, while the chances of misclassifying squat as corrugation are 0.18.
The mAP@0.5:0.95, which provides a broader range of IoU thresholds, was calculated as 0.87, demonstrating the robustness of the YOLOv5 model.Regarding the squat class, YOLOv5 achieved a precision of 0.95, a recall of 0.79, and an F1-score of 0.88.The mAP@0.5 and mAP@0.5:0.95 for squat were calculated as 0.96 and 0.93, respectively, indicating the model's ability to detect squat defects with high precision accurately.These results highlight the model's accurate detection capability for squat defects, as demonstrated in Fig. 3(b), where the detection accuracy for squat reached 99%.Notably, this performance is comparable to the result reported in [23], where the model achieved a 99% detection accuracy for rail surface defects.
The significance of the study lies in providing valuable insights into the performance of VGG16 and YOLOv5 models for defect classification in the railway industry.By achieving competitive results in distinguishing between corrugation and squat defects, the models offer potential benefits for railway maintenance planning and defect identification [30].Using computer vision and deep learning techniques in rail defect detection advances the field, enhancing railway safety and maintenance procedures.These findings contribute to the existing body of research on automated rail inspection methods, offering essential implications for the ongoing efforts to improve railway transportation systems' safety and reliability.

IV. CONCLUSION
This research study represents a significant contribution to the field by concentrating on identifying and localizing railway corrugation and squat defects by integrating VGG16 for classification and YOLOv5 for detection.The implemented models have exhibited commendable performance and achieved high precision, which are called all values.Noteworthy is the research's emphasis on a comprehensive approach that targets explicitly distinct defects, differentiating it from prior studies.The significance of this lies in using advanced computer vision and deep learning techniques to identify defects and significantly improve maintenance planning and defect management strategies within the railway sector.This research is a benchmark for future endeavors in applying cutting-edge technologies to enhance railway infrastructure maintenance.
In future research, it would be beneficial to investigate the applicability of these techniques to detect other types of rail defects and enhance the model's performance by utilizing larger datasets.Additionally, exploring the potential of this method to classify the depth and severity of defects and distinguishing between minor and major cases could further improve the effectiveness of this work.It is essential to note the limitations of this study, including the absence of specific evaluation metrics and the need for more diverse data.Moreover, addressing practical challenges such as real-time deployment and integration with existing systems is crucial for implementing these findings in the railway industry, enabling maintenance staff to prioritize their tasks effectively.

Fig. 3
Fig.3The predicted test images for (a) classification using VGG16; and (b) detection using YOLOv5