In recent years, artificial intelligence (AI) has made a significant impact on prostate cancer diagnosis using magnetic resonance imaging (MRI), particularly through diagnostic systems based on deep learning approaches. Among these, convolutional neural networks trained for semantic segmentation of clinically significant lesions have gained attention due to their clinical value and inherent interpretability. Used as assistive tools, such systems have already been shown not only to increase diagnostic accuracy, but also to reduce both inter-rater variability and diagnostic time. Despite these advances, standalone AI models for prostate cancer diagnosis still underperform compared to expert radiologists. The reason for radiologists' superiority may lie in their clinical training to account for physiological and modality-specific image alterations using domain knowledge and cognitive reasoning, aspects that are currently overlooked in state-of-the-art computer-aided diagnosis systems. To address this performance gap, this thesis advances prostate MRI interpretation by incorporating two real-world, yet often overlooked, challenges into AI model development: (1) frequent soft tissue deformations caused by physiological processes and (2) misalignment between multi-modal images. Both are forms of spatial variation to which segmentation networks are potentially sensitive. For each challenge, targeted, domain-informed strategies are proposed. These data-centric solutions are implemented as on-the-fly data augmentations during training, acting as inductive biases to improve model robustness against clinically relevant sources of image alterations. Although biomechanical models based on finite element methods hold strong potential for increasing prostate and lesion shape variability during training by simulating realistic soft tissue deformations, their practical utility is limited due to computational complexity and the need for specialized modeling expertise. To make such deformations suitable for scalable online data augmentation, a lightweight model was developed by introducing simplified biomechanical assumptions. Incorporating these deformations into model training improved both patient-level diagnostic accuracy and lesion-level detection rates. Furthermore, the benefit of using anatomically realistic transformations was demonstrated in contrast to random elastic deformations, which are prone to distort image features and compromise the fidelity of ground truth labels for benign and malignant conditions. Another clinical challenge addressed is the alignment errors between MRI imaging modalities. While radiologists can cognitively compensate for such inconsistencies, computer-aided diagnosis systems rely on aligned ground truth representations across all image modalities. However, the literature lacks consensus on whether image co-registration is beneficial for model training. Furthermore, when registration is applied, its effect on model performance is rarely reported. To systematically investigate this, multiple registration strategies were evaluated alongside a novel approach: misalignment augmentation. Instead of aiming for perfect anatomical alignment, this method introduces synthetic alignment errors during training to make network predictions invariant to such errors. Both registration and misalignment augmentation independently improved performance. Moreover, combining the two approaches led to a synergistic effect, further improving performance due to their complementary behavior and yielding a statistically significant improvement that brought diagnostic performance on par with expert radiologists. Further results also highlighted that common surrogate registration metrics (e.g. Dice coefficient) do not necessarily correlate with clinical task performance, emphasising the importance of evaluating strategies based on their impact on clinically relevant questions. The insights gained from the proposed data-centric strategies demonstrated their effectiveness, as reflected in the significant performance improvements observed on independent test sets. These findings underscore that incorporating domain knowledge into neural network training via data augmentation as an inductive bias can yield substantial benefits beyond those of generic state-of-the-art training pipelines. While the increasing availability of large-scale training data and the rise of generalist foundation models may reduce the reliance on such targeted solutions for routine applications, the inherent complexity of medical imaging suggests that domain-specific strategies will likely remain essential for enabling neural networks to address nuanced, clinically complex scenarios. This thesis makes a significant contribution to the field by demonstrating how clinically grounded, data-centric strategies can narrow the performance gap between automated systems and expert radiologists.
Bálint Kovàcs (Thu,) studied this question.