When a deep learning model for quantitative MRI (qMRI) is trained on one scanner vendor and deployed on another, it often fails dramatically. But why does it fail? We build a framework to isolate each physical factor and measure exactly how much damage it causes to deep learning-based parameter estimation. We generate 100,000 synthetic MRF (Magnetic Resonance Fingerprinting) signals using Bloch-equation simulation, decompose vendor shift into 9 isolated physical corruptions, and test 4 robustness algorithms (ERM, DeepCORAL, GroupDRO, IRM) across 2 architectures (ResNet-1D, ViT-1D). We validate on real multi-scanner MRF data (30 brain scans across 3 scanners). Key findings:1. B₀ inhomogeneity is the dominant corruption with a non-monotonic dose-response peaking at 25 Hz.2. Peak normalization masks B₁⁺ sensitivity — without it, halving B₁ causes 39% error increase.3. All standard robustness algorithms fail (DS3 = 39–58×); IRM fails entirely.4. Data scaling cannot overcome uncalibrated physics — more source data does not improve out-of-distribution performance.5. A simple hybrid combining deep learning with classical dictionary matching outperforms either approach alone.6. Real multi-scanner data confirms T₂ is 3.5× more variable than T₁ across scanners. This repository contains the full paper (LaTeX source, 16 pages, 10 figures), source code/Github link(Bloch simulator, ResNet-1D, ViT-1D, ERM/CORAL/GroupDRO/IRM), 26 measured experiments (all checkpointed and resumable), and all result data in JSON format.
Sreenath P. Kyathanahally (Wed,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: