To the editor, Guo et al1 reported a multi-institutional study developing and validating a nomogram to estimate preoperative hemorrhage risk in pediatric moyamoya disease, based on a large dataset spanning multiple centers and years. Meanwhile, several aspects of the study’s design and implementation warrant closer attention. This study adheres to the TITAN Guidelines 2025, which govern the declaration and use of artificial intelligence in research2. First, the cohort spans 2004–2022, a period over which angiographic hardware, classification guidelines, and grading standards for AChA/PComA dilation have evolved substantially. Even within the same institution, advances in image quality and changes in grading practice can shift classifications, introducing temporal heterogeneity into both propensity score estimation and regression modeling. Updated diagnostic criteria3 and the advent of high-resolution imaging techniques4,5 exemplify how practice changes across time. A temporal stratification analysis, inclusion of diagnosis year as a covariate, or retrospective re-grading with harmonized criteria could address this. Second, the reduction in sample size from 1350 to 392 following 1:3 propensity score matching is substantial. Although the authors demonstrate post-match covariate balance (P > 0.05), no analysis is presented on how matching altered the distributions of key predictors such as age, disease duration, and angiographic grades. Even well-balanced matched sets can differ in representativeness from the source population, potentially limiting external validity. Recent methodological reviews caution that propensity score matching can paradoxically reduce efficiency and bias estimates by discarding large subsets of the cohort6,7. Formal quantification of pre- versus post-match distribution shifts (e.g., using Kolmogorov–Smirnov tests8) or the use of weighting approaches could mitigate this risk. Third, the decision to map total nomogram points (ΣPoints) to hemorrhage probability via a cubic polynomial fit (risk = −7.8 × 10−7 Σ3 + 0.00038 Σ2 − 0.0535 Σ + 2.389; Fig. 6B) raises concerns about model robustness. The polynomial is unbounded: unrealistic inputs such as Σ = 0, while improbable clinically, yield probabilities greater than 1. Without explicit input limits or output clipping, the publicly available tool could generate invalid probabilities for erroneous entries. In contrast, logistic transformations are the standard in clinical prediction modelling, ensuring monotonicity and bounding risk estimates strictly between 0 and 19. Their adoption would improve stability and interpretability. Finally, although the analysis is cross-sectional, there is potential for survivor bias analogous to immortal-time bias if classification and predictor measurement occur substantially after diagnosis. Non-hemorrhagic cases must remain event-free during this pre-classification interval, whereas early hemorrhagic cases may be excluded or misclassified. Such differences in “immortal” periods can distort risk estimates. Recent pediatric MMD studies with long follow-up emphasize that timing of classification relative to hemorrhagic events is critical in interpreting risk trajectories10. In sum, temporal consistency of grading, post-matching representativeness, probability mapping constraints, and potential survivor-bias mechanisms merit further consideration. Addressing these points would strengthen both the scientific validity and clinical usability of the proposed tool.
Lai et al. (Mon,) studied this question.