Abstract Data poisoning attacks present a significant challenge to the integrity and reliability of federated learning (FL) systems, where model training occurs collaboratively across decentralized devices. These attacks involve the deliberate injection of malicious data to corrupt the model’s training process, ultimately undermining its performance. Given the decentralized nature of FL and the lack of direct access to local data, detecting and mitigating these attacks becomes particularly difficult, especially in unsupervised scenarios where labeled data is unavailable. In this paper, we introduce a novel Federated Data Sanitization Defense to address these security threats in federated learning environments. This defense mechanism leverages federated clustering to group model updates based on semantic consistency, identifying and isolating outlier updates that are likely to be poisoned. A targeted data sanitization strategy is then applied to filter out malicious data, ensuring that only trustworthy information is used to update the global model. This decentralized process occurs on each participating device, enabling real-time detection and mitigation of data poisoning attacks. Through extensive experiments, we validate the effectiveness of Federated Data Sanitization Defense, demonstrating its ability to enhance the security and robustness of federated learning systems against data poisoning, while preserving privacy and model integrity.
Chen et al. (Mon,) studied this question.