Tool wear monitoring is essential for ensuring machining efficiency and product quality, particularly for difficult-to-machine materials such as Inconel 718 (IN718). Traditional deep learning models, such as Conventional Convolutional Neural Networks (CNNs), often struggle to capture complex wear patterns and lack accuracy across varying machining conditions while developing image-based tool wear identification systems. To address these limitations, this paper presents a Vision Transformer (ViT) model for identifying tool-wear categories during end-milling of IN718. The performance of the ViT-based model is systematically compared with a CNN-based EfficientNet-b0 model. The robustness and generalization of the ViT-based model are validated on two previously unseen image datasets: one with conditions similar to those of the training data and another acquired under varying lighting conditions. The results indicate that the ViT model outperforms the EfficientNet-b0 model in terms of classification accuracy and computational efficiency. The ViT model achieves higher accuracy with fewer training epochs and faster convergence. Furthermore, it exhibits strong generalization across different lighting conditions, demonstrating robustness to variations in the machining environment. The findings presented in this work clearly demonstrate ViT’s effectiveness in tool wear classification and its potential as a reliable, efficient algorithm for developing tool wear monitoring systems for practical machining applications.
Singh et al. (Mon,) studied this question.