Machine unlearning in federated graph learning must satisfy the multi-level indistinguishability requirement of the deletion of a target node being undetectable at the level of the global model, of the unlearning client’s local model, and of every non-target client’s local model. Approximate unlearning methods that pass confidence-based audits may still leave geometric traces through embedding drift at one or more of these K+1 levels. We formalize this requirement, introduce a five-model threat taxonomy, and extend the Hub–Ripple embedding drift audit to global, local, and cross-client levels. Across 31,900 trials spanning five graph benchmarks, five federated unlearning methods, and four supplementary ablations (K-value, cross-edge handling, control sampling, and DP-SGD defense), we find that all approximate methods fail the following multi-level requirement: the Confidence–Embedding Gap persists at 0.12 (versus 0.35 centralized), cross-client leakage correlates with shared cross-edge count (r=0.56, p<10−160), and a federated participant outperforms a white-box external auditor (AUC 0.83 versus 0.81). Client-level unlearning is more detectable at the global level than node-level unlearning (AUC 0.81 versus 0.77), contradicting the intuition that coarser deletion yields stronger privacy. FedRetrain satisfies global and local indistinguishability but exhibits residual cross-client leakage (Cross-Mean L2 AUC =0.62±0.04) because re-aggregation itself perturbs the global parameter vector. No method evaluated achieves full multi-level indistinguishability. Supplementary studies confirm that this is a structural property of FedAvg; DP-SGD reduces Cross L2 AUC by only 0.013 at the cost of a 79% accuracy drop, and FedSage-like neighbor sharing does not change the leakage profile. Multi-level geometric auditing, spanning all K+1 models, is the necessary evaluation floor that any method claiming verifiable privacy compliance must satisfy.
Han et al. (Mon,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: