A multimodal deep learning model integrating radial artery pulse wave and vocal signals achieved an accuracy of 0.7165 and an AUC of 0.8454 for detecting cardiovascular disease in an external cohort.
Observational (n=747)
Sí
Does a multimodal deep learning model integrating radial pulse waves and vocal signals improve the automated detection and discrimination of cardiovascular disease (CAD and HF) compared to single-modality models?
A multimodal deep learning model integrating non-invasive radial pulse waves and vocal signals demonstrates feasibility for the automated screening and discrimination of coronary artery disease and heart failure.
Estimación del efecto: AUC 0.8454
Cardiovascular diseases (CVDs), particularly coronary artery disease (CAD) and its severe sequela, heart failure (HF), pose a massive global public health burden, necessitating efficient, non-invasive early screening tools. Traditional Chinese Medicine (TCM) offers a unique holistic diagnostic paradigm through “four examinations,” where “pulse diagnosis” (Mai Zhen, perceiving internal hemodynamics) and “auscultation-olfaction diagnosis” (Wen Zhen, assessing external systemic manifestations via voice) are pivotal. However, these subjective methods lack objective quantification. This study aims to scientifically operationalize the TCM “combined diagnosis” (He-Can) principle for automated CVD discrimination through a prediction model development and validation study. We implemented a rigorous multimodal data acquisition protocol. Radial artery pulse waves (reflecting TCM pulse diagnosis) were collected under standardized conditions, and vocal signals (reflecting TCM auscultation) of sustained /a:/ phonation were synchronously recorded from 553 subjects (Healthy, CAD, HF). To assess the potential added value of multimodal integration, we systematically compared single-modality models (pulse-only and voice-only) with multimodal fusion models using four deep learning architectures (MLP, GAN-Discriminator, ResNet-MLP, and Bi-LSTM) under ten-fold cross-validation. In an independent external validation cohort (n = 194), the fused model achieved an accuracy of 0.7165 and an AUC of 0.8454, indicating maintained performance on unseen data. Explainable AI analyses (SHAP and LIME) suggested that the model’s predictions drew on both pulse-derived hemodynamic features and vocal acoustic features. Among the higher-contributing features were pulse dynamic variables (e.g., t3/tmax) and vocal acoustic biomarkers (e.g., MFCCs), suggesting that both modalities contributed to model predictions. This study provides empirical support for the feasibility of digitally integrating selected TCM-informed diagnostic signals within a prediction model framework. By digitizing and integrating pulse and voice signals, we developed a multimodal prediction model with potential utility for non-invasive cardiovascular screening in primary care, while linking the TCM concept of combined diagnosis with contemporary artificial intelligence and cardiovascular physiology.
Lyu et al. (Fri,) conducted a observational in Cardiovascular disease (Coronary Artery Disease and Heart Failure) (n=747). Multimodal deep learning model (pulse wave and vocal signals) vs. Single-modality models (pulse-only and voice-only) was evaluated on Model accuracy and AUC in external validation cohort (AUC 0.8454). A multimodal deep learning model integrating radial artery pulse wave and vocal signals achieved an accuracy of 0.7165 and an AUC of 0.8454 for detecting cardiovascular disease in an external cohort.