In machine learning, the objective of training a classification model is to learn the mapping relationship between features and labels. Label noise data has a severe detrimental effect on model performance, often surpassing the impact of feature noise. Consequently, label noise cleaning techniques constitute one of the most popular topics within data quality research. Numerous approaches to addressing label noise are based on filtering or correction. When employed independently, these approaches often fail to achieve satisfactory results in numerous scenarios. Conversely, their combined application typically yields more pronounced effects. CNC-NOS represents an advanced label noise cleaning method, employing an integrated filter and noise scores for noise identification and processing. However, the design of its clean function relies on absolute distance in noise score calculation, failing to capture the density of noisy samples among neighbors. Furthermore, neighbor determination remains reliant on Euclidean distance, insufficiently accounting for spatial distribution. This paper therefore proposes LNC-RDNCN, a multi-class label noise cleaning method based on relative density and nearest centroid neighbors (NCN). Extensive simulation experiments demonstrate that this method can accurately identify noisy data, implement appropriate corrections and filtering to enhance data quality, and generally outperform other noise processing methods in terms of average accuracy.
Fu et al. (Tue,) studied this question.