December 5, 2025Open Access

Distributed clustering meets federated learning: a clustering-based approach to data poisoning mitigation

Key Points

Real-time detection improves the integrity of model updates in federated learning systems, addressing data poisoning threats.
Federated learning systems face significant risks from malicious data that can corrupt model training processes.
Targeted data sanitization and clustering methods engage multiple decentralized devices to filter out malicious data effectively.
The Federated Data Sanitization Defense enhances security while preserving privacy and model integrity in federated environments.

Abstract

Abstract Data poisoning attacks present a significant challenge to the integrity and reliability of federated learning (FL) systems, where model training occurs collaboratively across decentralized devices. These attacks involve the deliberate injection of malicious data to corrupt the model’s training process, ultimately undermining its performance. Given the decentralized nature of FL and the lack of direct access to local data, detecting and mitigating these attacks becomes particularly difficult, especially in unsupervised scenarios where labeled data is unavailable. In this paper, we introduce a novel Federated Data Sanitization Defense to address these security threats in federated learning environments. This defense mechanism leverages federated clustering to group model updates based on semantic consistency, identifying and isolating outlier updates that are likely to be poisoned. A targeted data sanitization strategy is then applied to filter out malicious data, ensuring that only trustworthy information is used to update the global model. This decentralized process occurs on each participating device, enabling real-time detection and mitigation of data poisoning attacks. Through extensive experiments, we validate the effectiveness of Federated Data Sanitization Defense, demonstrating its ability to enhance the security and robustness of federated learning systems against data poisoning, while preserving privacy and model integrity.

Read Full Paperexternally

Ask AI

Mark Helpful

Bookmark

Relay

View Full Paper