Sepsis early warning is hindered by data silos, temporal leakage, and threshold choices that obscure operational performance. We present a leakage-aware federated-learning evaluation pipeline that enforces group/temporal separation and compares models at a fixed alert workload. Stage-1 benchmarks local, FedAvg, and FedProx LSTM/Transformer models on PhysioNet/CinC 2019 using the official A/B partitions in bidirectional cross-hospital evaluation (A→B/B→A) after removing ICULOS. Stage-2 constructs a Sepsis-3-aligned MIMIC-IV task using full SOFA-component features and simulated clients to emulate institutional heterogeneity. Federated training improves out-of-hospital generalization for LSTM models on PhysioNet, whereas Transformer models remain robust across 3–12 h horizons. On MIMIC-IV, fixed alert-rate evaluation (α = 5%) clarifies workload–timeliness trade-offs, and centralized XGBoost achieves the strongest stay-level detection with clinically meaningful lead times. Supplementary privacy and security stress tests further contextualize residual deployment risks. Overall, leakage control and workload-matched evaluation are essential for trustworthy, operationally actionable sepsis early warning.
Jin et al. (Thu,) studied this question.