May 7, 2026Open Access

Beyond demographic tables: integrating data quality in clinical trial representativeness

Key Points

Key points are not available for this paper at this time.

Abstract

Introduction: Clinical trial representativeness is essential for ensuring that study findings generalise to target treatment populations. Current assessment approaches rely on subjective demographic comparisons that lack standardisation and fail to account for data quality. Existing data quality frameworks assess completeness at the dataset level as the availability of required data values but do not address cohort-specific suitability for representativeness analysis. This study proposes that for clinical trial datasets, the completeness dimension should be interpreted through two complementary aspects: demographic coverage (the degree to which trial demographics represent the target population) and dataset completeness (the availability of required demographic data for analysis). Missing demographic data can compromise similarity assessments, yet no standardised approach exists to integrate both aspects of completeness into representativeness evaluation. Methods: This work introduces a quantitative framework that operationalises this dual interpretation of completeness. Coverage is measured using Jensen-Shannon Distance across demographic distributions, while dataset completeness quantifies the percentage of valid data available. These metrics are aggregated through clinically informed weights into a single suitability score. The framework was validated using simulated trial cohorts sampled from the MIMIC-III database and compared against a target sepsis population of 1,315 patients extracted from the same dataset. The validation employed simple random sampling with varying cohort sizes to simulate Phase II/III single-centre trial scenarios. Results: ), indicating that high representativeness can be achieved in smaller trials. Patient weight and language showed the lowest completeness, reflecting dataset limitations that can be mitigated through feature weighting. Discussion: Interpretability and reproducibility suggest potential for integration into trial design workflows, supporting evidence-based enrolment strategies and transparent regulatory evaluation. By providing systematic, quantifiable metrics, this work advances clinical trial quality assessment beyond qualitative comparisons and establishes a foundation for systematic evaluation of trial representativeness.

Beyond demographic tables: integrating data quality in clinical trial representativeness

Key Points

Abstract

Cite This Study