Deep learning models have demonstrated a strong performance in various classifications of varied amounts of data. As these models are prone to various attacks, even the smallest change can generate errors and lead to the classification of data. Adversarial attacks, which can significantly impact the model's performance, pose a threat to these models. In this work, the vulnerability of deep learning models in clinical contextual text classification using adversarial perturbations is demonstrated. By applying the Fast Gradient Sign Method (FGSM) and Projected Gradient Descent (PGD), evaluating the model robustness and data sensitivity, and were able to demonstrate the attacks with a decrease in accuracy drop of 23%. With white box attacks, trained a DistilBERT model and optimized the model accordingly to sustain the attacks. Our results demonstrate significant prediction shifts from minor input perturbations and suggest a new metric for calculating the susceptibility of the underlying text that generates a susceptible score. Further, the adversarially trained model can withstand FGSM and PGD attacks significantly.
A Sun, study studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: