What question did this study set out to answer?

The aim is to enhance small object detection accuracy and explainability in various challenging visual contexts.

April 8, 2026Open Access

Generalised and Explainable Multi-Scale YOLO-SAHI Framework of Robust Small Object Detection on Multi-Domain Applications.

Key Points

The aim is to enhance small object detection accuracy and explainability in various challenging visual contexts.
Developed a multi-scale YOLO-SAHI framework incorporating slicing-aided inference and adaptive feature enhancement.
Utilized Contrast Limited Adaptive Histogram Equalisation (CLAHE) for better visibility in low contrast images.
Implemented an uncertainty-triggered explainability mechanism, applying Grad-CAM only to low-confidence predictions.
Evaluated the model in multiple domains such as industrial fault detection and medical image analysis.
Achieved improved mean Average Precision (mAP) for small object detection in diverse conditions.
Demonstrated better performance in challenging visual contexts with reduced boundary loss.
Provided effective interpretability while lowering computational costs.

Abstract

Identifying small objects in computer vision is still a difficult problem. This is due to a loss of important features, edge disruptions during image processing, and poor performance in different situations. Although modern deep learning methods, especially those using YOLO detectors, can deliver quick predictions, they often have trouble accurately detecting small objects in high-resolution images. Moreover, current explainability methods like Grad-CAM can be expensive to use consistently. To address these issues, this paper presents a Generalised and Explainable Multi-Scale YOLO-SAHI framework. It includes slicing-aided inference, adaptive feature improvement, and selective explainability. By combining YOLO-based detection with the SAHI approach, the system uses overlap-based slicing to better locate tiny items and reduce boundary loss. It also employs Contrast Limited Adaptive Histogram Equalisation (CLAHE) to improve visibility in low contrast and poor lighting. A key feature of this study is the introduction of an uncertainty-triggered explainability mechanism. In this mechanism, Grad-CAM is applied only to predictions with low confidence, which cuts down on computational costs while still being interpretable. The proposed model is evaluated across various fields, such as industrial fault detection, agricultural disease detection, and medical image analysis, to ensure it works well in different situations. Expected outcomes include improved mean Average Precision (mAP) for small object detection, better performance in challenging visual contexts, and effective interpretability with reduced overhead. By offering a scalable, efficient, and understandable framework for recognizing small objects in real-world settings, this study tackles significant gaps in computational efficiency, accuracy, and general applicability.

Generalised and Explainable Multi-Scale YOLO-SAHI Framework of Robust Small Object Detection on Multi-Domain Applications.

Key Points

Abstract

Cite This Study