PULSE, a multimodal large language model trained on over one million ECG images, outperformed general-purpose MLLMs by 21% to 33% in average accuracy across diverse ECG interpretation tasks.
Does PULSE, a multimodal large language model trained on ECG images, improve ECG interpretation accuracy compared to general-purpose MLLMs?
PULSE, a novel open-source multimodal large language model trained on over one million ECG images, establishes a new state-of-the-art for automated ECG image interpretation, significantly outperforming general-purpose models.
Abstract Electrocardiograms (ECGs) are essential, non-invasive diagnostic tools for assessing cardiac conditions. Existing methods often have limited generalizability, focus on narrow condition sets, and rely on raw physiological signals, which may be unavailable in resource-limited settings where only printed or digital ECG images are accessible. Recent advances in multimodal large language models (MLLMs) offer new opportunities, yet ECG image interpretation remains challenging due to the lack of instruction-tuning data and standardized benchmarks. To address these gaps, we introduce , the first large-scale ECG image instruction-tuning dataset with over one million samples, covering diverse tasks including feature recognition, rhythm analysis, morphology assessment, and clinical report generation. We develop , a fully open-source MLLM for ECG image interpretation trained on . We further curate , a human expert-developed benchmark spanning four core ECG interpretation tasks across nine datasets, incorporating both synthesized and real-world ECG images to enable clinically realistic evaluation. Our experiments demonstrate that establishes a new state of the art, outperforming general-purpose MLLMs by 21% to 33% in average accuracy. These results highlight the potential of to improve ECG image interpretation in clinical practice. All code, data and models are available at https://aimedlab.github.io/PULSE/ .
Liu et al. (Mon,) conducted a other in Cardiovascular diseases (ECG interpretation). PULSE (Multimodal Large Language Model) vs. Proprietary MLLMs (GPT-4o, Gemini 1.5 Pro, Claude 3.5 Sonnet) and open-source MLLMs was evaluated on Performance on ECGBench (Accuracy, AUC, F1, Report Score). PULSE, a multimodal large language model trained on over one million ECG images, outperformed general-purpose MLLMs by 21% to 33% in average accuracy across diverse ECG interpretation tasks.