What question did this study set out to answer?

The aim is to improve skin lesion classification by addressing data biases due to skin tone variations and data imbalance.

March 21, 2026Open Access

Bias Analysis and Data Augmentation Strategies in Skin Lesion Classification

Key Points

The aim is to improve skin lesion classification by addressing data biases due to skin tone variations and data imbalance.
Utilized data augmentation techniques and traditional feature extractors for image analysis.
Employed classification models including k-Nearest Neighbor, Random Forest, and Support Vector Machine.
Focused on two datasets, HAM10000 and PAD-UFES-20, known for class imbalances.
Applied the Individual Typology Angle (ITA) metric to assess skin tone distribution.
Demonstrated significant class imbalances in the HAM10000 and PAD-UFES-20 datasets.
Found that variations in skin tone impact model performance.
Showed that data augmentation techniques can mitigate some bias in classification outcomes.

Abstract

Early detection of skin cancer is vital for effective treatment and improving patient recovery. In recent years, a growing number of computer vision studies have been developed to aid in diagnosis, drawing significant attention from researchers. However, challenges still persist, such as data imbalance and the lack of comprehensive datasets. Additionally, limited research has focused on how variations in skin tone across different populations affect the performance of models in skin lesion classification. This study seeks to create a more effective approach to address data biases in lesion classification across diverse skin tones. We initially explored several data augmentation techniques, employing traditional feature extractors for image analysis. For classification, models such as k-Nearest Neighbor, Random Forest, and Support Vector Machine were used. This study focused on two well-known and publicly available skin lesion datasets: HAM10000 and PAD-UFES-20, both of which have significant class imbalances. Further experiments were conducted to assess potential biases, with the Individual Typology Angle (ITA) metric applied to evaluate the skin tone distribution within the datasets.

Ask AI

Helpful

Bookmark

View Full Paper