Current computational approaches to prostate cancer diagnosis rely on either linear causal models that cannot handle the nonlinear dependencies among imaging, genomic, and clinical variables, or on deep learning methods that demand exhaustive expert annotation and lack mechanisms for counterfactual reasoning. This work introduces ProCausal-WS, a weakly supervised causal representation learning framework that addresses both limitations simultaneously. It rests on three interlocking components: an invertible flow causal encoder that maps high-dimensional multimodal observations into a low-dimensional space of clinically interpretable causal factors through bijective transformations; an exogenous clinical intervention module that uses dynamic gating and structural equations to simulate treatment scenarios and generate controllable counterfactual predictions; and a weakly supervised alignment mechanism that combines contrastive learning with projection heads to constrain the semantic identifiability of the learned factors using only a small fraction of expert-labeled samples. On the TCGA-PRAD dataset, the framework achieves 92.3% clinical causal concept identification accuracy while requiring only 8% complete annotations, and on the PANDA dataset, it reaches 89.6% with 5% annotations. Intervention mean-squared error is reduced to 0.018 on TCGA-PRAD, one-fourth that of the best baseline. Cross-dataset generalization yields an AUROC drop of no more than 0.026 when transferring between institutions with different scanners and staining protocols. Expert pathologists rated 89.6% of the generated counterfactual predictions as biologically plausible, and a longitudinal consistency analysis against real post-treatment biopsies confirms that the counterfactuals track actual disease trajectories rather than hallucinating visually convincing but causally unfaithful features.
Kong et al. (Thu,) studied this question.