To ensure the operational safety of trains, it is essential to monitor the condition of the rails. In order to detect problems ranging from wear and tear to possible sabotage, no comprehensive and continuous monitoring is carried out today. The objective of this study is to develop and robustly validate a deep-learning model capable of reliably classifying a broad range of safety-critical rail surface defects. The use of deep learning to classify defect types based on optical images is a promising approach, but the existing literature does not yet achieve a robust, high-performing classification for a broad range of failure types. Many approaches rely mainly on local feature extraction, but this carries the risk of overlooking global relationships, which are crucial for distinguishing certain defect types. This study addresses this gap by utilizing the specific features of this problem domain (small, local, and global defect types). To this end, DualSightNet is introduced as a hybrid architecture enhanced by an attention module for classifying a broad range of railway track surface defects. The model achieves a five-fold cross-validated average balanced accuracy of 97.55 % on a peer-reviewed, real-world dataset of 5,153 images covering seven defect types, recorded directly from an inspection vehicle under operational conditions, indicating strong generalization across the diverse real-world variations represented in the dataset. Compared to existing CNN- or Transformer-based approaches, DualSightNet is the first approach that combines local and global feature extraction through a gating-based fusion mechanism and then enhances the fused representation using an attention module, which enables substantially more robust multiclass defect recognition. This sets a new benchmark for our problem domain, surpassing previous approaches, which either lack broad defect coverage or do not employ rigorous cross-validated evaluation. Our results have far-reaching practical implications, proving that by leveraging problem-specific features, neural networks are able to robustly classify a broad range of defect types. The inference time of the proposed system (3.00 ms per image) makes DualSightNet suitable for deployment in automated inspection vehicles and real-time monitoring scenarios.
Mai et al. (Fri,) studied this question.