What question did this study set out to answer?

To develop a framework for accurate lung cancer severity detection while ensuring data privacy across multiple clinical institutions.

March 7, 2026Open Access

Privacy-Preserving Federated Learning for Multi-Institutional Lung Cancer Severity Detection

Key Points

To develop a framework for accurate lung cancer severity detection while ensuring data privacy across multiple clinical institutions.
Introduced a privacy-preserving federated neural ensemble model (PP-FNE).
Developed a gradient boosting-based federated learning strategy (MIF-GBF).
Created a hybrid convolutional-transformer network (CF-CTN).
Established a semi-adaptive federated attention-aggregated model (SAFAM).
Evaluated the framework using a synthetic dataset representing clinical heterogeneity.
The SAFAM model achieved an overall classification accuracy of 93.4%.
The model demonstrated robustness with only a 1.3% accuracy degradation under noise.
Strong privacy protection was maintained using encrypted model updates.
All models exhibited effective interpretability and adherence to HIPAA and GDPR data principles.

Abstract

Lung cancer continues to be the most common cause of cancer-related mortality globally and the timely detection of lung cancer and classification of its severity levels are critical to improving survival. Nonetheless, data privacy regulations and institutional data silos often create barriers to developing advanced robust AI models among clinical centers. This paper presents a framework for privacy-preserving multi-institutional lung cancer severity classification utilizing federated learning (FL) with secure aggregation, where only encrypted model updates are exchanged while raw patient data remain locally stored. The framework encompasses four new ideas: a privacy-preserving federated neural ensemble model (PP-FNE), a gradient boosting-(GB-)based FL strategy (MIF-GBF), a hybrid convolutional-transformer network (CF-CTN), and a semi-adaptive federated attention-aggregated model (SAFAM). Each of these ideas provides a way to connect sites in a multi-institutional effort while addressing data diversity/heterogeneity, model interpretability, and collaboration across sites while providing strong privacy protection measures for sensitive health data. The proposed framework is evaluated using a synthetic dataset developed to mimic the clinical heterogeneity of real-world clinical multi-site networks. The best-performing model, SAFAM, achieved an overall classification accuracy of 93.4%, demonstrated robustness to intelligently crafted noise (1.3% accuracy degradation), and preserved predictive performance under encrypted aggregation with minimal communication overhead per federation round. CF-CTN strengths lie in multimodal integration for lung cancer severity classification and model interpretability, while MIF-GBF had notable strengths in providing interpretability for GB-models specifically. PP-FNE exhibited stability as an ensemble model under variability across sites. All four individually based model FL methods restricted all communication/exchange across sites to conducting encrypted model update exchanges via a secure aggregation protocol, aligning with data minimization principles under HIPAA and GDPR). These results provide evidence that, if an FL approach considers task autonomous algorithmic innovations, accurate and privacy-protected lung cancer severity detection can be achieved through a distributed clinical setting.

Bookmark

View Full Paper

Bookmark

View Full Paper

Privacy-Preserving Federated Learning for Multi-Institutional Lung Cancer Severity Detection

Key Points

Abstract

Cite This Study