What type of study is this?

This is a Quantitative Study study.

October 22, 2025

Data-level sampling for dealing with imbalanced datasets: better protection against membership inference attacks

Key Points

Models trained using cost-sensitive learning show increased vulnerability to membership inference attacks.
Experiments conducted on the UCI Adult dataset and APS dataset confirm the findings regarding privacy risks.
Algorithm-level adjustments in imbalanced datasets reveal more sensitive information during inference.
The study highlights the need for better protective strategies against privacy risks in machine learning.

Abstract

The use of machine learning models trained on imbalanced datasets with sensitive information has raised privacy concerns. One significant threat is the Membership Inference Attack (MIA), which tries to figure out if a particular data point was included in the training set. This paper investigates whether algorithm-level cost-sensitive learning poses a greater privacy leakage risk than data-level sampling. We conducted experiments using two datasets: the UCI Adult dataset, which focuses on predicting income, and the APS dataset, which focuses on predicting scientific productivity. Our results indicate that models trained with cost-sensitive learning are more vulnerable to MIAs. This supports the hypothesis that correcting for imbalances at the algorithm level can reveal more private information.

KI fragen

Bookmark

KI fragen

Bookmark

Data-level sampling for dealing with imbalanced datasets: better protection against membership inference attacks

Key Points

Abstract

Cite This Study