Federated learning (FL) has emerged as a popular paradigm for distributed machine learning over decentralized data. A typical FL training task involves a fleet of client devices with private data and a centralized server for aggregating the global model. Data generated by FL clients, e.g., smart phones, vehicles, and cameras, is prone to noise. While the impact of data noise on centralized learning (CL) is well understood, to our best knowledge there is a lack of a systematic study from this point of view for FL. In this paper, we fill this gap by presenting an empirical investigation to provide a deeper understanding regarding the impact of data noise on FL. Our study is enabled by DataNoiseGenerator, an open-source and extensible toolkit that we developed for the injection of controlled data noise across five diverse data modalities: image, video, audio, text, and tabular data. We then carry out extensive experiments based on the noisy data generated by DataNoiseGenerator, and our experimental evaluation results reveal that FL is significantly more vulnerable to data noise compared to CL, in terms of the quality of the trained ML models. This gap between FL and CL widens as the intensity of data noise and the proportion of noisy FL clients increase. We further present a detailed analysis to diagnose the root cause of this increased sensitivity of FL to data noise. Our analysis finds that the aggregation performed by the FL server can amplify divergent updates from FL clients trained on noisy data, thereby hindering global model convergence. We conclude that data quality issues are a fundamental challenge for deploying robust FL systems and demand novel decentralized data cleaning mechanisms.
Building similarity graph...
Analyzing shared references across papers
Loading...
Hu et al. (Mon,) studied this question.
synapsesocial.com/papers/6a0d5089f03e14405aa9c68b — DOI: https://doi.org/10.1145/3802124
Jinming Hu
University of Toronto
Jiahao Gu
University of Toronto
Kenta Ploch
University of Toronto
Proceedings of the ACM on Management of Data
University of Toronto
National University of Singapore
Microsoft (United States)
Building similarity graph...
Analyzing shared references across papers
Loading...