January 1, 2021Open Access

Understanding Unintended Memorization in Language Models Under Federated Learning

Key Points

Key points are not available for this paper at this time.

Abstract

Recent works have shown that language models (LMs), e.g., for next word prediction (NWP), have a tendency to memorize rare or unique sequences in the training data. Since useful LMs are often trained on sensitive data, it is critical to identify and mitigate such unintended memorization. Federated Learning (FL) has emerged as a novel framework for large-scale distributed learning tasks. It differs in many aspects from the well-studied central learning setting where all the data is stored at the central server, and minibatch stochastic gradient descent is used to conduct training. This work is motivated by our observation that NWP models trained under FL exhibited remarkably less propensity to such memorization compared to the central learning setting. Thus, we initiate a formal study to understand the effect of different components of FL on unintended memorization in trained NWP models. Our results show that several differing components of FL play an important role in reducing unintended memorization. First, we discover that the clustering of data according to users-which happens by design in FLhas the most significant effect in reducing such memorization. Using the Federated Averaging optimizer with larger effective minibatch sizes for training causes a further reduction. We also demonstrate that training in FL with a userlevel differential privacy guarantee results in models that can provide high utility while being resilient to memorizing out-of-distribution phrases with thousands of insertions across over a hundred users in the training set.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Om Thakkar

Swaroop Ramaswamy

Rajiv Mathews

Actions

Institutions

Google (United States)

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Understanding Unintended Memorization in Language Models Under Federated Learning

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study