What question did this study set out to answer?

To develop a federated learning pipeline for liver tumour classification with a focus on data heterogeneity and transfer learning.

May 6, 2026Open Access

Multi-Centre Liver Tumour Classification via Federated Learning: Investigating Data Heterogeneity, Transfer Learning, and Model Efficiency

Key Points

To develop a federated learning pipeline for liver tumour classification with a focus on data heterogeneity and transfer learning.
Developed a FedProx-based federated learning pipeline for collaborative training.
Utilized the LiTS dataset for binary classification and evaluated on the 3D-IRCADb dataset.
Conducted experiments with various backbone architectures and assessed different heterogeneity scenarios.
FedProx shows comparable performance to FedAvg with better stability in specific heterogeneity settings.
Validation-to-external gap indicates challenges in external-domain robustness for practical deployment.
ImageNet pretraining provides consistent improvements for data-sparse clients.

Abstract

This paper investigates federated multi-centre liver tumour classification from contrast-enhanced CT under realistic data heterogeneity and domain shift. To address the practical constraint that medical data are often siloed across institutions, we develop a FedProx-based federated learning pipeline that enables collaborative training without exchanging raw patient data. Using the LiTS dataset as the training domain, we construct a slice-level binary classification task based on voxel-level annotations, while rigorously assessing out-of-distribution generalisation on an external held-out dataset, 3D-IRCADb. We conduct comprehensive experiments across multiple backbone architectures, including ResNet-50, EfficientNet-B3, ViT-B/16, and MobileNetV3-Small, comparing FedProx and FedAvg under three heterogeneity intensities (IID, mild non-IID, and severe non-IID). Furthermore, we evaluate transfer learning strategies, ranging from frozen backbones to partial fine-tuning of the last stage, and perform ablations on the proximal coefficient μ and local epochs E to characterise optimisation behaviour. Our results show that FedProx is generally comparable to FedAvg, with slightly more stable behaviour in some heterogeneous settings. We also observe a clear validation-to-external gap, indicating that external-domain robustness remains challenging and requires cautious interpretation for deployment. ImageNet pretraining yields consistent gains, particularly for data-sparse clients, while partial fine-tuning enhances adaptation to CT-specific features. Finally, MobileNetV3-Small offers a favourable performance–efficiency trade-off by reducing communication payload and computation cost, supporting practical deployment on resource-constrained clinical edge devices.

Read Full Paperexternally

Bookmark

View Full Paper

Cite This Study

Zhu et al. (Fri,) studied this question.

synapsesocial.com/papers/69fa8eac04f884e66b53102b https://doi.org/https://doi.org/10.3390/computers15050286

Bookmark

View Full Paper