Identifying anti-inflammatory peptides is crucial for advancing new therapeutics for the treatment of inflammatory and auto-immune diseases. The objective of this study was to identify the best preprocessing method and machine learning model for predicting anti-inflammatory peptides based on their primary amino acid sequence by evaluating models through F1 score, the classification accuracy (CA), precision (Prec), and recall metrics. This was done by preprocessing the data set in three different ways through a formula, into k-mer bags, and through the estimation of physicochemical properties. For each different preprocessing method there were six models trained: logistic regression, support vector machines, decision trees, gradient boosting, random forest, and neural networks. These models were assessed through cross validation to ensure consistent results and were then evaluated on the metrics already listed. The models trained on formula based preprocessing had overall higher performance metrics. The random forest model demonstrated a higher performance and consistency compared to the other models. These results highlight the effectiveness of machine learning applications in predicting peptide behavior exemplifying the potential growth of machine learning in similar fields.
Matthew Iwamoto (Fri,) studied this question.