Antimicrobial peptides (AMPs) are short amino acid sequences that play a critical role in immune defenses and have gained attention as potential alternatives to traditional antibiotics. Due to the difficulty of experimentally identifying new AMPs, machine learning approaches have been explored as a method for predicting antimicrobial activity from peptide sequences. In this study, baseline machine learning models were applied to classify peptide sequences as antimicrobial or non-antimicrobial using simple sequence-based feature representations. Peptide sequences were converted into numerical features using a k-mer bag-of-words approach and used to train logistic regression and random forest classifiers. Model performance was evaluated on a held-out test set using confusion matrix analysis and receiver operating characteristic (ROC) curves. Both models demonstrated performance close to random guessing, with accuracy values near 50% and ROC area under the curve values close to 0.5. These results indicate that baseline machine learning models using k-mer sequence features alone are insufficient for reliably predicting antimicrobial peptides. This study highlights the need for more advanced feature representations and modeling approaches to improve predictive performance in antimicrobial peptide classification.
Matthew Cho (Sat,) studied this question.