Abstract 1.1 Background Active Crohn’s disease (CD) manifests as bowel inflammation, leading to chronic damage. The reference standard for active CD must integrate evidence from multiple tests, each with suboptimal accuracy alone, to first classify CD presence and then activity. Expert panel consensus (EPC) can synthesise such evidence but is time-consuming, resource-intensive, and vulnerable to group biases. Latent class modelling (LCM) offers a promising statistical alternative; however, the absence of comparisons against EPC has limited broader adoption. Thus, we conducted the first comparative evaluation of LCM and EPC as reference standards for a diagnostic test accuracy (DTA) study. 1.2 Methods We retrospectively analysed 284 patients with newly diagnosed or suspected relapsing CD from a multi-centre clinical trial. We focused on terminal ileal CD (TICD), where test evidence was strongest. Panels classified active TICD using all available clinical, imaging, endoscopic, histological, and biochemical evidence collected six months after recruitment. Following expert clinical guidance, we estimated a two-class random-effects Bayesian model using flat priors and binary scores from seven tests (magnetic resonance imaging (MRI), ultrasound, endoscopy, histopathology, C-reactive protein, faecal calprotectin, and Harvey-Bradshaw index). We compared LCM and EPC classifications for active TICD, explored reasons for disagreement, assessed model stability using bootstrapping, and re-estimated the original study’s primary outcome. 1.3 Results The positive agreement between LCM and EPC was 87% (95% CI 83, 91), with negative agreement of 70% (95% CI 65, 75) for classifying active TICD. Disagreement mainly arose because the model included MRI and ultrasound, whereas the panels could not. LCM classifications for active TICD were consistent across bootstrap samples in 80% (227/284) of patients, demonstrating model stability. LCM produced a similar estimate of the original study’s primary outcome, estimated using EPC. 1.4 Conclusions LCM demonstrated efficiency, stability, and comparable accuracy estimates to EPC, supporting its viability as an active TICD reference standard for DTA studies. Our findings justify further evaluation of LCM across diverse diseases and study designs to encourage its broader adoption for diagnostic research.
Parry et al. (Tue,) studied this question.