Abstract Motivation Cancer is driven by genetic changes, known as mutations, that lead to uncontrolled division of cells. The functional significance of a vast number of these cancer somatic mutations is unknown and it is one of the major challenges in cancer research. In this study, we performed an integrative analysis of 30 tumor types from PAN-cancer mutation data collected from COSMIC database. We have analysed a set of 61,364 missense mutations (57,535 drivers and 3,829 passengers) from 682 cancer-causing genes and derived various important features from amino acid sequences, predicted AlphaFold structures, and amino acid contact networks. We observed that the motif-based preference, neighboring residue information, residue depth, and disorder regions around the site of mutation are important for the discrimination of drivers and passengers. Results We further developed cancer-specific computational models to discriminate cancer-causing and passenger mutations using deep learning and the integration of AlphaFold predicted structure information improved the pathogenicity prediction of mutations. Our method achieved an average classification accuracy of 84.06% with 10-fold cross-validation. Availability and implementation The prediction server is available at https://web.iitm.ac.in/bioinfo2/PANDriver/index.html. We envisage that the AI-based prediction models would be an important tool to identify driver mutations and can extend the scope of precision medicine for cancer.
Pandey et al. (Wed,) studied this question.