The application of differential privacy in deep learning often leads to significant performance degradation on class-imbalanced medical datasets. Methods such as adding noise to gradients for differential privacy are effective on large datasets, like MNIST and CIFAR-100, but perform poorly on small, imbalanced medical datasets, like HAM10000 and ISIC2019. This is because the imbalanced distribution causes the gradients from the few-shot classes to be clipped, resulting in the loss of crucial information, while the majority classes dominate the learning process. This leads the model to fall into suboptimal solutions early. To address this issue, we propose SDD-DPSGD, which uses a step-wise dynamic exponential scheduling mechanism for noise and clipping thresholds to preserve gradient information. By allocating more privacy budget and employing higher clipping thresholds during the initial training phases, the model can avoid suboptimal solutions and improve its performance. Experiments show that SDD-DPSGD outperforms comparable algorithms on the HAM10000 dataset, and the ISIC2019 dataset.
Huang et al. (Sat,) studied this question.