The use of machine learning models trained on imbalanced datasets with sensitive information has raised privacy concerns. One significant threat is the Membership Inference Attack (MIA), which tries to figure out if a particular data point was included in the training set. This paper investigates whether algorithm-level cost-sensitive learning poses a greater privacy leakage risk than data-level sampling. We conducted experiments using two datasets: the UCI Adult dataset, which focuses on predicting income, and the APS dataset, which focuses on predicting scientific productivity. Our results indicate that models trained with cost-sensitive learning are more vulnerable to MIAs. This supports the hypothesis that correcting for imbalances at the algorithm level can reveal more private information.
Silva et al. (Mon,) studied this question.