Abstract Compared with traditional centralized machine learning, federated learning (FL) can train a good global model by only having clients upload gradient information instead of private data, seemingly solving the data privacy protection dilemma faced by centralized machine learning. However, in the real world, when the data held by clients are nonindependent and identically distributed (non-IID), traditional FL aggregation algorithms often perform poorly. Meanwhile, recent research has shown that gradient information uploaded by clients may still leak gradients, and more seriously, unreliable third-party server and curious clients pose a fatal threat to the system’s security. To address these issues, we propose an efficient FL solution that effectively addresses non-IID data and achieves privacy protection. First, to address non-IID data, we propose an improved aggregation algorithm that dynamically adjusts client gradient aggregation weights. Second, to address the issue of unreliable server snooping or colluding with corrupt clients to steal client privacy, we protect the gradients and loss shared by clients by adding masks, effectively preventing collusion attacks and ensuring the security of the entire system. Furthermore, we use linear homomorphic hash to enable clients to verify the aggregated results returned by the unreliable server. Finally, this paper implements and analyzes the solution. Experiments on large datasets and neural networks show that compared with similar solutions, the client’s communication overhead is reduced by 97%, and the encryption efficiency is improved by 12. 6, without sacrificing model accuracy.
Ke et al. (Fri,) studied this question.