Early detection of skin cancer is vital for effective treatment and improving patient recovery. In recent years, a growing number of computer vision studies have been developed to aid in diagnosis, drawing significant attention from researchers. However, challenges still persist, such as data imbalance and the lack of comprehensive datasets. Additionally, limited research has focused on how variations in skin tone across different populations affect the performance of models in skin lesion classification. This study seeks to create a more effective approach to address data biases in lesion classification across diverse skin tones. We initially explored several data augmentation techniques, employing traditional feature extractors for image analysis. For classification, models such as k-Nearest Neighbor, Random Forest, and Support Vector Machine were used. This study focused on two well-known and publicly available skin lesion datasets: HAM10000 and PAD-UFES-20, both of which have significant class imbalances. Further experiments were conducted to assess potential biases, with the Individual Typology Angle (ITA) metric applied to evaluate the skin tone distribution within the datasets.
Lopes et al. (Mon,) studied this question.