An ultrasound video analysis model using temporal fusion and heatmap regression achieved an mIoU of 83.22% for segmentation and an mAP of 0.698 for keypoint detection, outperforming baseline models.
Does a temporal fusion and heatmap regression model improve the precise measurement of left ventricular parameters in echocardiographic PLAX videos compared to standard deep learning models?
A novel temporal fusion and heatmap regression model significantly improves the accuracy of left ventricular parameter measurements from echocardiographic PLAX videos compared to existing deep learning benchmarks.
Abstract Background Left ventricular geometric parameters are critical for diagnosing and prognosticating cardiovascular diseases. Currently, most measurement techniques rely on two‐dimensional transthoracic echocardiography (TTE), where an end‐diastolic (ED) frame from the parasternal long‐axis (PLAX) view is selected, and key points on the interventricular septum (IVS), left ventricular internal dimension (LVID), and left ventricular posterior wall (LVPW) are identified. However, using a single frame often fails to capture the entire structure of the IVS and LVPW, especially when complex anatomical details or blurred edges are present, leading to positional shifts or loss of key points and, hence, considerable measurement errors. Purpose In this study, we propose an automatic method for measuring left ventricular structural parameters based on echocardiographic PLAX‐view videos. The approach focuses on the ED frame along with the immediately preceding and following frames. Methods We developed an ultrasound video analysis model that integrates temporally distributed and incomplete structural information to reconstruct the complete anatomies of the IVS and LVPW. The model combines a segmentation branch for precise boundary localization with a heatmap regression branch for chamber centerline and LVID measurement line estimation, enforcing perpendicular anatomical constraints. The dataset comprised 400 PLAX echocardiographic videos from 400 distinct patients, acquired at 56 fps. The data were divided into training and validation sets in a ratio of 8:2. The proposed model was compared with U‐Net, U‐Net++, DeepLabV3, SegFormer, and TransUNet for segmentation, and HRNet and ViTPose for keypoint detection. Evaluation metrics included mIoU, Dice similarity coefficient (DSC), Hausdorff distance (HD), and average precision (, , mAP). Statistical significance was assessed using paired t ‐tests with a significance threshold of , and multiple comparisons were corrected using the Benjamini–Hochberg (BH) procedure. Results Our results demonstrate robust performance improvements over existing benchmarks. In the segmentation task, our method achieved a mean intersection over union (mIoU) of 83.22% (DSC 0.856, HD 10.174). Statistical analysis demonstrated that this performance is significantly superior to classic models like U‐Net (), showing a positive small‐to‐medium effect size (). In the keypoint detection task, our approach achieved an mAP of 0.698 ( = 0.965), significantly outperforming the DeepLabV3 baseline () with a positive medium‐to‐large effect size (). Moreover, against strong baselines such as ViTPose, our method maintained a statistically significant advantage () with a positive small effect size (). Conclusions These outcomes demonstrate the method's robust performance in accurately delineating structural boundaries and reducing measurement errors.
Chen et al. (Fri,) conducted a other in Left ventricular parameter measurement (n=400). Temporal fusion and heatmap regression model vs. Classic models (e.g., U-Net, DeepLabV3, ViTPose) was evaluated on Mean intersection over union (mIoU) for segmentation. An ultrasound video analysis model using temporal fusion and heatmap regression achieved an mIoU of 83.22% for segmentation and an mAP of 0.698 for keypoint detection, outperforming baseline models.