Antimicrobial resistance (AMR) due to the overuse of antibiotics is becoming a global health crisis. To solve this, antimicrobial peptides - small peptides which can produce inhibitory effects against bacteria, parasites, viruses, and fungi - have great therapeutic potential. This study focuses on answering the question of whether or not, through the use of sequence representation methods such as K-mer bag-of-words, physicochemical descriptors, and amino acid composition, it is possible to not only build predictive machine learning models to predict antimicrobial peptides but to determine which sequence representation method yields the most accurate predictions. 6 different machine learning algorithms were used with each numerical sequence representation method: Support Vector Machines, Gradient Boosting, Decision Tree, Logistic Regression, Random Forest, and Neural Network. The results indicate that, out of the three sequence representation methods, amino acid composition was the most apt for the data. The best performing model out of the six was found to be Neural Network. The results of this study can help further the body of knowledge regarding binary AMP classification using machine learning, therefore helping to facilitate the identification and discovery of new AMPs to mitigate the growing AMR crisis.
Krithik Rajinikanth (Sat,) studied this question.