Lung cancer continues to be the most common cause of cancer-related mortality globally and the timely detection of lung cancer and classification of its severity levels are critical to improving survival. Nonetheless, data privacy regulations and institutional data silos often create barriers to developing advanced robust AI models among clinical centers. This paper presents a framework for privacy-preserving multi-institutional lung cancer severity classification utilizing federated learning (FL) with secure aggregation, where only encrypted model updates are exchanged while raw patient data remain locally stored. The framework encompasses four new ideas: a privacy-preserving federated neural ensemble model (PP-FNE), a gradient boosting-(GB-)based FL strategy (MIF-GBF), a hybrid convolutional-transformer network (CF-CTN), and a semi-adaptive federated attention-aggregated model (SAFAM). Each of these ideas provides a way to connect sites in a multi-institutional effort while addressing data diversity/heterogeneity, model interpretability, and collaboration across sites while providing strong privacy protection measures for sensitive health data. The proposed framework is evaluated using a synthetic dataset developed to mimic the clinical heterogeneity of real-world clinical multi-site networks. The best-performing model, SAFAM, achieved an overall classification accuracy of 93.4%, demonstrated robustness to intelligently crafted noise (1.3% accuracy degradation), and preserved predictive performance under encrypted aggregation with minimal communication overhead per federation round. CF-CTN strengths lie in multimodal integration for lung cancer severity classification and model interpretability, while MIF-GBF had notable strengths in providing interpretability for GB-models specifically. PP-FNE exhibited stability as an ensemble model under variability across sites. All four individually based model FL methods restricted all communication/exchange across sites to conducting encrypted model update exchanges via a secure aggregation protocol, aligning with data minimization principles under HIPAA and GDPR). These results provide evidence that, if an FL approach considers task autonomous algorithmic innovations, accurate and privacy-protected lung cancer severity detection can be achieved through a distributed clinical setting.
Srividya et al. (Thu,) studied this question.