Abstract Motivation The Gene Damage Index (GDI) quantifies the cumulative mutational damage of protein-coding genes in the general population and helps prioritize candidate disease genes in sequencing studies. However, the original GDI is influenced by coding sequence length and does not account for gene-specific differences in variant deleteriousness. We developed GDIv2, an updated framework correcting for coding sequence length and incorporating gene-specific normalization of CADD scores to improve discrimination between disease-relevant and non-relevant genes. Results Four GDIv2 implementations were generated using 1000 Genomes Project and gnomAD datasets for both GRCh37 and GRCh38 genome builds. Benchmarking against the original GDI showed that all GDIv2 versions significantly improved discrimination between relevant and accessory genes, reduced erroneous exclusion of relevant genes, and increased exclusion of accessory genes. GDIv2₁kGP₃7 achieved the best AUC performance and excluded 24. 6% of accessory genes while retaining 96. 7% of relevant genes. Compared with RVIS, LOEUF, shet, and CoNeS, GDIv2₁kGP₃7 performed similarly in AUC analyses. Combining GDIv2₁kGP₃7 with CoNeS and LOEUF further improved filtering, excluding 42. 7% of accessory genes while removing only 2. 4% of relevant genes. Availability and implementation GDIv2 resources are freely available at https: //hgidsoft. rockefeller. edu/GDI/GDIv2. html.
Talouarn et al. (Thu,) studied this question.