October 16, 2025Open Access

Integrative machine learning predicts activating kinase mutations for precision oncology

Key Points

The classifier achieves an AUROC of 0.85 and a balanced accuracy of 0.76 on 1,003 kinase mutations, significantly outperforming existing predictors.
By utilizing a multi-modal feature set, including structural descriptors and biochemical changes, the model enhances predictions for activating kinase mutations.
Analysis reveals that detailed structural features are only available for 21% of mutations, necessitating innovative data imputation strategies to leverage missing information.
This approach underlines the importance of quantifying sequence-structure-function relationships in advancing targeted cancer therapies.

Abstract

Kinases are enzymes that catalyze phosphorylation and play crucial roles in a myriad of cellular regulatory processes and hemostasis. Patient-specific genetic mutations that aberrantly activate kinases can profoundly influence cancer progression and alter drug efficacy. Predicting the impact of such missense mutations across the human kinome on protein function and cellular signaling is therefore a critical step toward personalized targeted therapy. Here, we present Kinome AI, an integrative machine learning framework that classifies kinase missense mutations as activating or non-activating. Kinome AI is trained on a rich multi-modal feature set, including residue-level biochemical changes, sequence embeddings from a protein language model, and structural descriptors of kinase ATP substrate complexes derived from molecular modeling. Notably, detailed structural features were available for only 21% of mutants; we leverage these as privileged information during training to impute missing structural data for the remaining ~79. This strategy boosts performance without requiring structural inputs for new (unseen) mutations. The resulting classifier achieves an area under the receiver operating characteristic curve (AUROC) of 0.85 and a balanced accuracy (BACC) of 0.76 across 1,003 mutations spanning 110 different kinases substantially outperforming existing bioinformatics and general-purpose variant effect predictors. This work provides a robust approach to quantify sequence structure function relationships of cancer-driving kinase mutations, paving the way for improved personalized cancer treatment.

Read Full Paperexternally

Mark Helpful

Bookmark

Relay

View Full Paper