What question did this study set out to answer?

The study aims to identify physical factors causing deep learning model failures in quantitative MRI across different scanner vendors.

June 12, 2026Open Access

Why Deep Learning Fails Across MRI Vendors: A Physics Attribution Study with Dose-Response Curves, Scaling Laws, and Hybrid Rescues

Key Points

The study aims to identify physical factors causing deep learning model failures in quantitative MRI across different scanner vendors.
Generated 100,000 synthetic MRF signals using Bloch-equation simulation.
Isolated 9 physical corruptions due to vendor shift and tested 4 robustness algorithms across 2 architectures.
Validated findings on real multi-scanner MRF data from 30 brain scans across 3 scanners.
B₀ inhomogeneity is the dominant factor, showing a significant 25 Hz effect.
Without peak normalization, a halving of B₁ sensitivity led to a 39% error increase.
Standard robustness algorithms significantly underperformed, with IRM failing entirely.

Abstract

When a deep learning model for quantitative MRI (qMRI) is trained on one scanner vendor and deployed on another, it often fails dramatically. But why does it fail? We build a framework to isolate each physical factor and measure exactly how much damage it causes to deep learning-based parameter estimation. We generate 100,000 synthetic MRF (Magnetic Resonance Fingerprinting) signals using Bloch-equation simulation, decompose vendor shift into 9 isolated physical corruptions, and test 4 robustness algorithms (ERM, DeepCORAL, GroupDRO, IRM) across 2 architectures (ResNet-1D, ViT-1D). We validate on real multi-scanner MRF data (30 brain scans across 3 scanners). Key findings:1. B₀ inhomogeneity is the dominant corruption with a non-monotonic dose-response peaking at 25 Hz.2. Peak normalization masks B₁⁺ sensitivity — without it, halving B₁ causes 39% error increase.3. All standard robustness algorithms fail (DS3 = 39–58×); IRM fails entirely.4. Data scaling cannot overcome uncalibrated physics — more source data does not improve out-of-distribution performance.5. A simple hybrid combining deep learning with classical dictionary matching outperforms either approach alone.6. Real multi-scanner data confirms T₂ is 3.5× more variable than T₁ across scanners. This repository contains the full paper (LaTeX source, 16 pages, 10 figures), source code/Github link(Bloch simulator, ResNet-1D, ViT-1D, ERM/CORAL/GroupDRO/IRM), 26 measured experiments (all checkpointed and resumable), and all result data in JSON format.

Read Full Paperexternally

Perguntar à IA

Bookmark

View Full Paper