Key points are not available for this paper at this time.
Recent works have shown that language models (LMs), e.g., for next word prediction (NWP), have a tendency to memorize rare or unique sequences in the training data. Since useful LMs are often trained on sensitive data, it is critical to identify and mitigate such unintended memorization. Federated Learning (FL) has emerged as a novel framework for large-scale distributed learning tasks. It differs in many aspects from the well-studied central learning setting where all the data is stored at the central server, and minibatch stochastic gradient descent is used to conduct training. This work is motivated by our observation that NWP models trained under FL exhibited remarkably less propensity to such memorization compared to the central learning setting. Thus, we initiate a formal study to understand the effect of different components of FL on unintended memorization in trained NWP models. Our results show that several differing components of FL play an important role in reducing unintended memorization. First, we discover that the clustering of data according to users-which happens by design in FLhas the most significant effect in reducing such memorization. Using the Federated Averaging optimizer with larger effective minibatch sizes for training causes a further reduction. We also demonstrate that training in FL with a userlevel differential privacy guarantee results in models that can provide high utility while being resilient to memorizing out-of-distribution phrases with thousands of insertions across over a hundred users in the training set.
Building similarity graph...
Analyzing shared references across papers
Loading...
Om Thakkar
Swaroop Ramaswamy
Rajiv Mathews
Google (United States)
Building similarity graph...
Analyzing shared references across papers
Loading...
Thakkar et al. (Fri,) studied this question.
www.synapsesocial.com/papers/69d816065c3030ff03d19387 — DOI: https://doi.org/10.18653/v1/2021.privatenlp-1.1