• We benchmark LMM, GAMM, GNMM, and NME on longitudinal UPDRS from voice. • GAMM achieved the best test MSE (6.56); LMM was close (7.70) on held‑out visits. • Deep neural variants underperformed mixed‑effects baselines on this dataset. • Voice telemonitoring features (jitter, shimmer, HNR) predict PD severity. • Public UCI dataset; code release supports reproducibility. Predicting Parkinson's Disease (PD) progression is crucial for personalized treatment, and voice biomarkers offer a promising non-invasive method for tracking symptom severity through telemonitoring. However, analyzing this longitudinal data is challenging due to inherent within-subject correlations, the small sample sizes typical of clinical trials, and complex patient-specific progression patterns. While deep learning offers high theoretical flexibility, its application to small-cohort longitudinal studies remains under-explored compared to traditional statistical methods. This study presents an application of the Neural Mixed Effects (NME) framework to Parkinson's telemonitoring, benchmarking it against Generalized Neural Network Mixed Models (GNMM) and semi-parametric Generalized Additive Mixed Models (GAMMs). Using the Oxford Parkinson's telemonitoring voice dataset ( N = 42 ), we demonstrate that while neural architectures offer flexibility, they are prone to significant overfitting in small-sample regimes. Our results indicate that GAMMs provide the optimal balance, achieving superior predictive accuracy (MSE 6.56) compared to neural baselines (MSE > 90) while maintaining clinical interpretability. We discuss the critical implications of these findings for developing robust, deployable telemonitoring systems where data scarcity is a constraint, highlighting the necessity for larger, diverse datasets for neural model validation.
Tong et al. (Sun,) studied this question.