What question did this study set out to answer?

The aim is to improve electrocardiogram image interpretation using multimodal large language models.

March 18, 2026Open Access

Teaching multimodal LLMs to comprehend 12-lead electrocardiographic images

Q: What is the clinical evidence from this study?

Study design: Other. Population: Cardiovascular diseases (ECG interpretation). Intervention: PULSE (Multimodal Large Language Model) vs. Proprietary MLLMs (GPT-4o, Gemini 1.5 Pro, Claude 3.5 Sonnet) and open-source MLLMs. Primary outcome: Performance on ECGBench (Accuracy, AUC, F1, Report Score).

Key Result

PULSE, a multimodal large language model trained on over one million ECG images, outperformed general-purpose MLLMs by 21% to 33% in average accuracy across diverse ECG interpretation tasks.

Key Points

The aim is to improve electrocardiogram image interpretation using multimodal large language models.
Introduction of a large-scale ECG image instruction-tuning dataset with over one million samples.
Development of an open-source multimodal large language model for ECG imagery.
Creation of a human expert-developed benchmark for evaluating ECG interpretation across multiple datasets.
The new model outperforms general-purpose multimodal large language models by 21% to 33% in average accuracy.
Successful evaluation indicates significant improvements in ECG image interpretation capabilities.

Structured PICO

Does PULSE, a multimodal large language model trained on ECG images, improve ECG interpretation accuracy compared to general-purpose MLLMs?

Population

Over one million synthesized and real-world 12-lead ECG images covering diverse tasks including feature recognition, rhythm analysis, morphology assessment, and clinical report generation.

Intervention

PULSE, a fully open-source 7B multimodal large language model (MLLM) trained on the ECGInstruct dataset.

Comparator

General-purpose proprietary MLLMs (e.g., GPT-4o, Gemini 1.5 Pro, Claude 3.5 Sonnet), open-source MLLMs (e.g., LLaVA), and domain-specific signal-based methods.

Outcome

Performance on ECG interpretation tasks measured by Macro AUC, Macro F1, Hamming Loss, accuracy, and Report Perfect Score.

PULSE, a novel open-source multimodal large language model trained on over one million ECG images, establishes a new state-of-the-art for automated ECG image interpretation, significantly outperforming general-purpose models.

Limitations

Original reports in training datasets are not always fully narrative and may consist of structured key points or multilingual entries
Lack of richer clinician-authored narrative reports for training
Need for prospective testing in hospital environments to bridge the gap between controlled evaluation and clinical practice
Risk of incorrect or overconfident outputs and potential dataset biases
Complex and open-ended tasks remain challenging and demand stronger reasoning and instruction-following capabilities

Abstract

Abstract Electrocardiograms (ECGs) are essential, non-invasive diagnostic tools for assessing cardiac conditions. Existing methods often have limited generalizability, focus on narrow condition sets, and rely on raw physiological signals, which may be unavailable in resource-limited settings where only printed or digital ECG images are accessible. Recent advances in multimodal large language models (MLLMs) offer new opportunities, yet ECG image interpretation remains challenging due to the lack of instruction-tuning data and standardized benchmarks. To address these gaps, we introduce , the first large-scale ECG image instruction-tuning dataset with over one million samples, covering diverse tasks including feature recognition, rhythm analysis, morphology assessment, and clinical report generation. We develop , a fully open-source MLLM for ECG image interpretation trained on . We further curate , a human expert-developed benchmark spanning four core ECG interpretation tasks across nine datasets, incorporating both synthesized and real-world ECG images to enable clinically realistic evaluation. Our experiments demonstrate that establishes a new state of the art, outperforming general-purpose MLLMs by 21% to 33% in average accuracy. These results highlight the potential of to improve ECG image interpretation in clinical practice. All code, data and models are available at https://aimedlab.github.io/PULSE/ .

Bookmark

View Full Paper

Cite This Study

Liu et al. (Mon,) conducted a other in Cardiovascular diseases (ECG interpretation). PULSE (Multimodal Large Language Model) vs. Proprietary MLLMs (GPT-4o, Gemini 1.5 Pro, Claude 3.5 Sonnet) and open-source MLLMs was evaluated on Performance on ECGBench (Accuracy, AUC, F1, Report Score). PULSE, a multimodal large language model trained on over one million ECG images, outperformed general-purpose MLLMs by 21% to 33% in average accuracy across diverse ECG interpretation tasks.

synapsesocial.com/papers/69ba430d4e9516ffd37a3d56 https://doi.org/https://doi.org/10.1038/s41746-026-02551-3

Bookmark

View Full Paper