What question did this study set out to answer?

This project aims to evaluate various featurization strategies and classification models for predicting pro-inflammatory peptides.

February 14, 2026Open Access

Comparison of Featurizers and Binary Classification Models on the Proinflammatory Prediction Task

Key Points

This project aims to evaluate various featurization strategies and classification models for predicting pro-inflammatory peptides.
Compared multiple featurization strategies: amino-acid composition, physicochemical characteristics, and k-mer representations.
Applied up-sampling to address class imbalance in the training dataset.
Trained and evaluated several classification models, including logistic regression, support vector machines, and random forest.
Used confusion matrices and ROC curves to assess model performance.
Random Forest achieved the highest AUC values among all models, indicating superior predictive power.
Other models showed varied performance, but none consistently matched the effectiveness of Random Forest.

Abstract

This project compares several featurization strategies and machine-learning classification models for predicting whether peptides are pro-inflammatory. Because the training dataset was highly imbalanced, up-sampling was applied to increase the minority (positive) class. Three feature representations were explored: amino-acid composition, amino-acid physicochemical characteristics, and k-mer (bag-of-words) representations. For each featurization, multiple classification models were trained and evaluated, including logistic regression, support vector machines, decision tree, random forest, gradient boosting, AdaBoost, and a neural network model. Model performance was compared using confusion matrices and ROC curves. Across all approaches, Random Forest consistently produced the highest AUC values, indicating superior predictive power compared with the other models, due to a combination of method sophistication and performance in over-training regimes.

Bookmark

View Full Paper

Bookmark

View Full Paper

Comparison of Featurizers and Binary Classification Models on the Proinflammatory Prediction Task

Key Points

Abstract

Cite This Study