Text generation models are increasingly integrated into digital environments to provide users with personalized messaging, learning, and productivity content. Relying on centralized training approaches raises serious concerns about user privacy, especially when sensitive data is involved. Existing personalized text generation models often require some form of direct data aggregation, which subjects users to the risk of data leakage or misuse of their data. This limitation prohibits their deployment in circumstances where privacy-related issues are pressing and compelling, such as in healthcare, finance, or personal communications. A practical and user-friendly approach is needed to achieve personalization without compromising confidentiality. A federated framework that abstracts federated learning to achieve personalized text generation is proposed, called FL-PTG (Federated Learning for Personalized Text Generation). FL allows the contribution of model training updates while the user’s raw data remains on their decentralized device. During training, model updates are anonymized before being sent to a central server, helping to protect user data. Experiments with benchmark datasets validate that FL-PTG demonstrates comparable text generation capabilities to centralized models, with minimal loss in perplexity, while significantly reducing the risk of privacy leakage. FL-PTG represents an interesting pathway towards personalized, user-relevant, and secure text generation that can be utilized, integrated into, or deployed in sensitive privacy scenarios.
AlShehhi et al. (Thu,) studied this question.