What is the clinical evidence from this study?

Study design: Observational. Population: Cardiovascular disease (Coronary Artery Disease and Heart Failure) (n=747). Intervention: Multimodal deep learning model (pulse wave and vocal signals) vs. Single-modality models (pulse-only and voice-only). Primary outcome: Model accuracy and AUC in external validation cohort (AUC 0.8454).

What question did this study set out to answer?

This study aims to develop and validate a multimodal prediction model for cardiovascular disease detection using pulse wave and vocal signals.

April 27, 2026Open Access

Multimodal deep learning for cardiovascular disease detection using pulse wave and vocal signals: a prediction model development and validation study

Resultado clave

A multimodal deep learning model integrating radial artery pulse wave and vocal signals achieved an accuracy of 0.7165 and an AUC of 0.8454 for detecting cardiovascular disease in an external cohort.

Puntos clave

This study aims to develop and validate a multimodal prediction model for cardiovascular disease detection using pulse wave and vocal signals.
Collected radial artery pulse waves and vocal signals from 553 subjects under standardized conditions.
Compared single-modality models with multimodal fusion models using four deep learning architectures.
Used ten-fold cross-validation and validated with an independent cohort of 194 subjects.
Fused model achieved an accuracy of 0.7165 and AUC of 0.8454 in the validation cohort.
Explainable AI analyses highlighted pulse dynamic variables and vocal biomarkers as key features contributing to model predictions.
Demonstrated feasibility of integrating TCM-informed diagnostic signals within a modern AI framework.

Diseño del estudio

Tipo

Observational (n=747)

Multicéntrico

Sí

PICO estructurado

Does a multimodal deep learning model integrating radial pulse waves and vocal signals improve the automated detection and discrimination of cardiovascular disease (CAD and HF) compared to single-modality models?

Población

747 subjects (553 in the main derivation cohort and 194 in an independent external validation cohort) comprising healthy individuals, patients with coronary artery disease (CAD) adjudicated via coronary angiography or CTA, and patients with heart failure (HF) with objective evidence of cardiac dysfunction and NYHA class III-IV. Excluded patients with severe cardiac arrhythmias, hemodynamically significant valvular heart disease, severe cardiomyopathy, and other major systemic diseases.

Intervención

Multimodal deep learning prediction model (ResNet-MLP architecture) integrating radial artery pulse waves (reflecting hemodynamics) and vocal signals of sustained /a:/ phonation.

Comparador

Single-modality prediction models (pulse-only and voice-only) and alternative deep learning architectures (MLP, GAN-Discriminator, and Bi-LSTM).

Resultado

Automated cardiovascular disease discrimination (classification accuracy and Area Under the Curve [AUC] for distinguishing between Healthy, CAD, and HF subjects).

A multimodal deep learning model integrating non-invasive radial pulse waves and vocal signals demonstrates feasibility for the automated screening and discrimination of coronary artery disease and heart failure.

Resultado numérico

Estimación del efecto: AUC 0.8454

Resumen

Cardiovascular diseases (CVDs), particularly coronary artery disease (CAD) and its severe sequela, heart failure (HF), pose a massive global public health burden, necessitating efficient, non-invasive early screening tools. Traditional Chinese Medicine (TCM) offers a unique holistic diagnostic paradigm through “four examinations,” where “pulse diagnosis” (Mai Zhen, perceiving internal hemodynamics) and “auscultation-olfaction diagnosis” (Wen Zhen, assessing external systemic manifestations via voice) are pivotal. However, these subjective methods lack objective quantification. This study aims to scientifically operationalize the TCM “combined diagnosis” (He-Can) principle for automated CVD discrimination through a prediction model development and validation study. We implemented a rigorous multimodal data acquisition protocol. Radial artery pulse waves (reflecting TCM pulse diagnosis) were collected under standardized conditions, and vocal signals (reflecting TCM auscultation) of sustained /a:/ phonation were synchronously recorded from 553 subjects (Healthy, CAD, HF). To assess the potential added value of multimodal integration, we systematically compared single-modality models (pulse-only and voice-only) with multimodal fusion models using four deep learning architectures (MLP, GAN-Discriminator, ResNet-MLP, and Bi-LSTM) under ten-fold cross-validation. In an independent external validation cohort (n = 194), the fused model achieved an accuracy of 0.7165 and an AUC of 0.8454, indicating maintained performance on unseen data. Explainable AI analyses (SHAP and LIME) suggested that the model’s predictions drew on both pulse-derived hemodynamic features and vocal acoustic features. Among the higher-contributing features were pulse dynamic variables (e.g., t3/tmax) and vocal acoustic biomarkers (e.g., MFCCs), suggesting that both modalities contributed to model predictions. This study provides empirical support for the feasibility of digitally integrating selected TCM-informed diagnostic signals within a prediction model framework. By digitizing and integrating pulse and voice signals, we developed a multimodal prediction model with potential utility for non-invasive cardiovascular screening in primary care, while linking the TCM concept of combined diagnosis with contemporary artificial intelligence and cardiovascular physiology.

Me gusta

Guardar

Ver artículo completo