Los puntos clave no están disponibles para este artículo en este momento.
Background Antibodies play a critical role in immune defense, with their antigen specificity primarily governed by the unique sequences of their heavy chains, rendering them invaluable tools in research and diagnostics. High-throughput sequencing technologies have facilitated comprehensive profiling of the immune repertoire, generating vast antibody sequence datasets that necessitate advanced analytical methods. Methods In this study, we utilized curated antibody sequences from NCBI databases to develop computational classification models for categorizing antibodies into predefined antigen classes. We extracted multifaceted features from the heavy chain sequences, encompassing physicochemical properties, structural composition, sequence order, and evolutionary information. These features were input into machine-learning classifiers to predict antigen specificity across five classes of antibodies: anti-dengue virus, anti-influenza virus, anti-tetanus bacillus, anti-SARS-CoV-2, and anti-Mycobacterium tuberculosis. Results Five tree-based machine-learning models were employed, with CatBoost achieving the highest accuracy of 0.7713. To further enhance predictive performance, we developed a stacking model leveraging multiple algorithms, resulting in an improved accuracy of 0.7803. Additionally, a Feature-Based Transformer deep-learning architecture was implemented, yielding an accuracy of 0.7399 and an F1-score of 0.6761. To elucidate the key determinants of antibody-antigen interactions, we applied the SHAP framework to assess feature importance. Among the top 30 contributing features, those representing sequence order and evolutionary information predominated, with amino acids such as cysteine (C), isoleucine (I), histidine (H), and phenylalanine (F) exhibiting notable SHAP values. Notably, cysteine (Cys) emerged as the most influential feature, underscoring its critical role in antibody structure and function. Specific antibodies contributed variably to these key features; for instance, the anti-tuberculosis antibody accounted for approximately 11% of a sequence order feature associated with alanine (A), while the anti-SARS-CoV-2 antibody contributed about 9.26% to a feature associated with isoleucine (I). Conclusions Our study demonstrates the efficacy of machine-learning and deep-learning approaches in classifying antibodies into specific antigen categories, providing sequence-based insights into features associated with antibody specificity. These findings have significant implications for the mechanistic understanding, isolation, and development of potential therapeutic antibodies.
Lin et al. (Wed,) studied this question.