In machine learning, the problem of class imbalance still acts as a bottleneck that limits how well traditional classifiers perform. There are two main angles to tackle this issue: one related to data and the other focused on algorithms. At the data processing level, various sampling methods have become the mainstream means to balance data distribution, among which SMOTE stands out prominently. Unlike random over-sampling that simply duplicates minority samples, SMOTE assesses how similar the features of minority samples are to one another. It then creates brand-new minority samples by using interpolation within the feature space of neighboring samples. This approach not only mitigates the overfitting risk inherent in random oversampling but also fully leverages existing sample information to construct a more balanced training set. However, SMOTE also has several drawbacks. For example, it has poor adaptability to data distributions and is extremely sensitive to noisy data and outliers. Additionally, it relies on pairwise interpolation and lacks the capability for dynamic adjustment. To address these issues, this paper proposes a novel dynamic interpolation over-sampling method based on a scoring mechanism of regular triangles with perturbations (TSP-SMOTE). First, on the basis of regular triangle tessellation, perturbations are applied to the vertices of the regular triangles. Subsequently, a targeted scoring mechanism is constructed according to the region type, and dynamic sampling point selection is realized based on this mechanism. After completing the above operations, the processing results are mapped back to the original dimensional space, and multiple rounds of linear interpolation operations are performed to finally generate new samples that meet the requirements. The TSP-SMOTE algorithm eliminates reliance on k-nearest neighbors, adapts the number of synthetic samples based on local minority density, synthesizes new samples using multiple instances, and leverages all-class information for region construction to suppress noise. The experimental results show that, compared with other oversampling methods, the TSP-SMOTE algorithm ranks 1st in the average ranking. In terms of classification accuracy, the TSP-SMOTE algorithm achieves the highest value in 118 out of all 234 metrics, which fully demonstrates its excellent performance in addressing class-imbalanced problems.
Song et al. (Wed,) studied this question.