What question did this study set out to answer?

To develop a framework that enables accurate and interpretable prostate cancer diagnosis while minimizing the need for extensive expert annotations.

March 22, 2026

ProCausal-WS: Weakly Supervised Causal Representation Learning Driven Interpretable Prostate Cancer Diagnosis

Key Points

To develop a framework that enables accurate and interpretable prostate cancer diagnosis while minimizing the need for extensive expert annotations.
Introduced ProCausal-WS, a weakly supervised causal representation learning framework.
Utilized an invertible flow causal encoder to map high-dimensional observations to a low-dimensional space.
Implemented a dynamic clinical intervention module for simulating treatment scenarios and generating counterfactual predictions.
Combined contrastive learning with weakly supervised alignment to enhance semantic identifiability.
Achieved 92.3% accuracy in clinical causal concept identification with only 8% expert annotations on TCGA-PRAD.
Reduced intervention mean-squared error to 0.018, significantly lower than baseline.
Maintained a minimal AUROC drop of 0.026 across different datasets and institutions.
Expert evaluations indicated 89.6% of counterfactual predictions were biologically plausible.

Abstract

Current computational approaches to prostate cancer diagnosis rely on either linear causal models that cannot handle the nonlinear dependencies among imaging, genomic, and clinical variables, or on deep learning methods that demand exhaustive expert annotation and lack mechanisms for counterfactual reasoning. This work introduces ProCausal-WS, a weakly supervised causal representation learning framework that addresses both limitations simultaneously. It rests on three interlocking components: an invertible flow causal encoder that maps high-dimensional multimodal observations into a low-dimensional space of clinically interpretable causal factors through bijective transformations; an exogenous clinical intervention module that uses dynamic gating and structural equations to simulate treatment scenarios and generate controllable counterfactual predictions; and a weakly supervised alignment mechanism that combines contrastive learning with projection heads to constrain the semantic identifiability of the learned factors using only a small fraction of expert-labeled samples. On the TCGA-PRAD dataset, the framework achieves 92.3% clinical causal concept identification accuracy while requiring only 8% complete annotations, and on the PANDA dataset, it reaches 89.6% with 5% annotations. Intervention mean-squared error is reduced to 0.018 on TCGA-PRAD, one-fourth that of the best baseline. Cross-dataset generalization yields an AUROC drop of no more than 0.026 when transferring between institutions with different scanners and staining protocols. Expert pathologists rated 89.6% of the generated counterfactual predictions as biologically plausible, and a longitudinal consistency analysis against real post-treatment biopsies confirms that the counterfactuals track actual disease trajectories rather than hallucinating visually convincing but causally unfaithful features.

Bookmark

ProCausal-WS: Weakly Supervised Causal Representation Learning Driven Interpretable Prostate Cancer Diagnosis

Key Points

Abstract

Cite This Study