Major histocompatibility complex class I (MHC I) plays a crucial role in immune functions. This complex typically binds short fragments of protein chains, 8-9 amino acid residues in length, referred to as epitopes. In this study, we investigated differences between the peptides that bind to this complex (dataset N1) and those that do not (dataset N0). To compare the datasets N1 and N0, Z-score analysis using the Z-score function was applied to identify statistically significant differences in physicochemical properties under study: aliphatic index (αi), charge (Zi), hydrophobicity (Hi), isoelectric point (pIi), molecular weight (Mi), and instability index (IIi). All properties except for the instability index depend solely on amino acid composition of the peptides and not on the sequence-specific features. For the evaluated physicochemical properties, the Z-score values indicated no significant differences between the datasets N1 and N0. Maximum Z-score values were 0.30 for the aliphatic index and 0.29 for hydrophobicity. The most robust and reliable separation between the datasets N1 and N0 was achieved using the r-value method, yielding classification accuracy of approximately 70% and Z-score of 0.63. This result is close to the separation accuracy of 75% obtained using the MHCflurry program. Analysis of amino acid distributions in the datasets N1 and N0 showed that the residues such as tyrosine, phenylalanine, isoleucine, leucine, and valine occur more frequently than cysteine, tryptophan, arginine, and lysine in the octa- and nonapeptide epitopes that noncovalently bind to MHC class I. Using bioinformatics analysis and artificial intelligence approaches, we demonstrated the extent to which binding and non-binding peptides can be discriminated based solely on amino acid composition.
Lobanov et al. (Sun,) studied this question.