• A novel IP-ViT model is proposed for PV fault classification using IR images. • Adaptive multi-SE pooling enhances feature extraction and channel weighting. • A two-stage vFOX–Grid Search optimization improves convergence and accuracy. • The proposed model achieves 92.78% accuracy and 92.75% F1-score on IR data. Infrared (IR) imaging enables rapid, non-contact inspection of photovoltaic (PV) modules. However, real-world infrared thermal image datasets are often imbalanced and of low resolution, which degrades classification accuracy and hinders practical deployment. This paper proposes a simple yet effective framework that integrates Denoising Diffusion Probabilistic Model (DDPM)-based data equalization with an improved pooling vision transformer (IP-ViT) to achieve balanced and robust PV fault classification. The DDPM oversamples minority classes, while the generated images are filtered using the structural similarity index measure (SSIM) to preserve high-quality samples. The proposed IP-ViT module processes grayscale images through pooling and compression-and-excitation operations to reweight feature channels within a lightweight multi-layer convolutional structure, yielding discriminative features with reduced computational complexity. Furthermore, a two-stage optimization strategy combining a variant of the FOX optimization algorithm and grid search is employed to automatically fine-tune the IP-ViT architecture. Experimental results on a 12-class IR-based PV fault dataset demonstrate that the proposed approach substantially enhances both classification accuracy and F1-score compared with conventional deep learning models and existing pooling-based vision transformer variants, confirming its effectiveness and potential for real-world applications.
Hong et al. (Sun,) studied this question.