Existing studies have confirmed the value of Internet search data in predicting infectious disease trends. This study aims to develop a predictive model using the Baidu Index to examine the relationship between the search volume of syphilis-related keywords and the prevalence of syphilis in China. We collected daily reported syphilis case counts and Baidu search volumes for syphilis-related keywords from January 2011 to March 2025. Keywords highly correlated with syphilis incidence were identified using Spearman rank correlation and time series cross-correlation analysis, and were incorporated into a composite search index (CSI). Based on monthly case numbers and the CSI, we developed an autoregressive integrated moving average (ARIMA) model and an ARIMAX model with the CSI as an external regressor. Model performance was evaluated using mean absolute error (MAE), root mean square error (RMSE), and mean absolute percentage error (MAPE). From January 2011 to March 2025, a total of 7,289,225 syphilis cases were reported in China, with a monthly average of 42,627 cases. Six keywords with a time lag of 0 months were included in the CSI. The ARIMAX (2,0,0)(2,1,0)₁₂ + CSI (Lag = 0) model outperformed the seasonal ARIMA (2,0,0)(2,1,0)₁₂ model, with lower MAE (5184.36 vs. 9007.81), RMSE (6010.31 vs. 10,328.01), and MAPE (10.0% vs. 16.0%). The ARIMAX model incorporating Baidu search data demonstrates superior predictive accuracy for syphilis incidence. It can serve as an effective tool for early warning and prediction, providing a valuable supplement to conventional surveillance systems.
Yuan et al. (Wed,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: