What question did this study set out to answer?

The aim is to utilize machine learning to predict anti-inflammatory peptides more efficiently than traditional methods.

February 14, 2026Open Access

Predicting Anti-Inflammatory Peptides Using Machine Learning

Key Points

The aim is to utilize machine learning to predict anti-inflammatory peptides more efficiently than traditional methods.
Used a dataset of 4194 binarily-labeled peptide sequences.
Employed n-grams featurization in Orange Data Mining software.
Applied 10-fold cross-validation for model testing.
Evaluated models using AUC to account for class imbalance.
Logistic Regression achieved an AUC of 0.813, indicating strong classification performance.
Balanced random forest yielded a comparable AUC of ≈0.757.
SVM performed poorly with an AUC of ≈0.529.

Abstract

Chronic inflammation can be harmful in diseases such as asthma, eczema, arthritis, and cardiovascular disease. Traditional methods of treatment, such as non-steroidal anti-inflammatory drugs (NSAIDs) available over-the-counter, come with long-term risks, which creates the need for new therapeutic agents. One of them is anti-inflammatory peptides, short sequences of amino acids that help inhibit inflammatory pathways. Anti-inflammatory peptides are rare and can take hours of expensive lab synthesis and biological validation to discover, but through machine learning, this process can be shortened and made cheaper. This study used a CSV file as a dataset of 4194 binarily-labeled peptide sequences, provided by George Mason University’s Young Scholars Research NextGen Science: Machine Learning & Bioinformatics program, imported into Orange Data Mining software for n-grams featurization, training, and 10-fold cross-validation for testing. AUC was used as a primary evaluation metric due to its insensitivity to the class imbalance present in this sparse dataset. Logistic Regression had the most robust classification performance and supported the hypothesis with an AUC of 0.813, well over 0.7. Balanced random forest also supported the hypothesis with a comparable but slightly lower AUC of ≈0.757. Classifiers such as SVM had the lowest AUC of ≈0.529, likely due to noise created by the high-dimensional sparse data, which linear models, such as logistic regression, as mentioned, are better at classifying. This study shows that machine learning models trained and tested on sequence-derived n-gram features alone can be enough to make discriminative predictions to discover therapeutic peptides faster, saving time while cutting costs.

Bookmark

View Full Paper

Bookmark

View Full Paper

Predicting Anti-Inflammatory Peptides Using Machine Learning

Key Points

Abstract

Cite This Study