An electronic phenotyping algorithm requiring two diagnosis codes plus treatment evidence improved the positive predictive value for prostate cancer to 0.927 compared to 0.897 for diagnosis codes alone.
Cross-Sectional (n=94,446)
How do different electronic phenotyping algorithms for prostate cancer perform compared to self-report, and how do they impact genetic association studies?
Including treatment data in electronic phenotyping algorithms improves the positive predictive value for identifying prostate cancer and significantly influences the results of downstream genetic association studies.
Absolute Event Rate: 0.927% vs 0.897%
1619 Background: The All of Us Research Program enables large scale population-based genetic association studies of cancer. All of Us does not contain cancer registry data, risking misclassification of cancer diagnosis. Thus, validated electronic phenotyping algorithms for cancer diagnoses are needed. We sought to develop and validate prostate cancer (PCa) phenotyping algorithms in All of Us , and assess the impact of different algorithms on gene-cancer associations. Methods: We conducted a cross-sectional analysis of All of Us male participants aged ≥50 years with structured electronic health record (EHR), personal health history survey, and whole genome sequencing (WGS) data (Duke IRB Pro00119057). Using self-report of PCa history as the gold standard, we developed and evaluated two electronic phenotyping algorithms: (1) two PCa diagnosis codes; or (2) two PCa diagnosis codes plus evidence of PCa treatment with radical prostatectomy, radiotherapy, or androgen deprivation. For each algorithm, we computed positive predictive value (PPV), and negative predictive value (NPV) with 95% confidence intervals (CI). Using each algorithm, we conducted genetic association studies between PCa and rare pathogenic variants (RPVs) in PCa predisposition genes ( ATM, BRCA1, BRCA2, HOXB13 G84E) among male participants with structured EHR and WGS data. Included RPVs had pathogenic/likely pathogenic designation in ClinVar with at least two stars. Gene-based odds ratios (OR) and 95% confidence intervals (CI) were calculated using logistic regression, adjusting for age and top 5 principal components. Results: We included 21,883 males in the validation study of PCa phenotyping algorithms, 3,961 (18.1%) of whom reported a personal history of prostate cancer. Algorithm 1 demonstrated a PPV of 0.897 (95% CI 0.885 – 0.907) and NPV of 0.927 (95% CI 0.923 – 0.931). Algorithm 2 demonstrated a PPV of 0.927 (95% CI 0.910 – 0.942) and NPV of 0.864 (95% CI 0.860 – 0.869). In the study of gene-PCa associations, 94,446 male participants were included, except for HOXB13 G84E (n = 59,915), which was limited to those of European ancestry. Associations between RPVs in PCa predisposition genes and PCa diagnoses using the two algorithms are shown in the Table. Conclusions: The inclusion of PCa treatment improves the PPV of PCa electronic phenotyping algorithms. PCa phenotyping definitions influence the results of downstream genetic association studies, and highlight the need for accurate and validated electronic phenotyping algorithms when using All of Us . Gene OR a (Algorithm 1 b ) 95% CI c (Algorithm 1) OR (Algorithm 2 d ) 95% CI (Algorithm 2) ATM 2.10 1.47 - 2.92 2.68 1.67 - 4.08 BRCA1 1.11 0.73 - 1.61 1.16 0.61 - 1.98 BRCA2 2.14 1.53 - 2.92 3.11 2.04 - 4.56 HOXB13 G84E 3.88 2.75 - 5.39 3.31 1.98 - 5.22 a OR = odds ratio. b Two prostate cancer diagnosis codes. c CI = confidence interval. d Two prostate cancer diagnosis codes plus treatment.
CREECH et al. (Wed,) conducted a cross-sectional in Prostate cancer (n=94,446). Algorithm 2 (two diagnosis codes plus treatment) vs. Algorithm 1 (two diagnosis codes) was evaluated on Positive predictive value (PPV) for prostate cancer. An electronic phenotyping algorithm requiring two diagnosis codes plus treatment evidence improved the positive predictive value for prostate cancer to 0.927 compared to 0.897 for diagnosis codes alone.