Background Bias in medical image segmentation can lead to unequal performance across demographic subgroups, raising concerns about fairness and reliability in clinical AI systems. While deep learning models have achieved high segmentation accuracy, ensuring equitable performance across race and gender remains a significant challenge, particularly in privacy-sensitive healthcare environments. Methods This study investigates fairness-aware medical image segmentation for hip and knee radiographs using deep learning models evaluated in both centralized and Federated Learning (FL) settings. We introduce Curriculum Learning (CL) strategies and Progressive Loss (PL) functions to regulate sample difficulty during training. In addition, we propose two novel fairness-oriented federated learning algorithms, Federated Intersection over Union (FedIoU) and Federated Intersection over Union with Outlier Analysis (FedIoUoutlier). Experiments are conducted using multiple segmentation backbones and simulated multi-site data partitions derived from the Osteoarthritis Initiative dataset. Model performance is evaluated using Intersection over Union (IoU), IoU standard deviation, Skewed Error Ratio (SER), and Min-Max Disparity across race and gender subgroups. Statistical significance was verified using paired t -tests to compare per-sample IoU performance against baseline configurations. Results Across both hip and knee segmentation tasks, curriculum learning and progressive loss strategies consistently improved segmentation accuracy and reduced demographic performance disparities in centralized training. In federated settings, fairness-aware aggregation further enhanced performance. Notably, FedIoUoutlier combined with balanced curriculum learning and tiered progressive loss achieved the highest mean IoU while yielding the lowest SER and Min-Max Disparity, indicating improved fairness without sacrificing accuracy. In several configurations, federated models matched or exceeded the performance of optimized centralized models, with statistically significant improvements in per-sample IoU over baseline configurations. Conclusion The results demonstrate that structured training strategies and fairness-aware federated aggregation can jointly improve accuracy, stability, and demographic fairness in medical image segmentation. By integrating curriculum learning, progressive loss, and novel FL algorithms, this work provides a practical pathway toward equitable and privacy-preserving AI systems for medical imaging.
Alam et al. (Fri,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: