April 17, 2025Open Access

A deep learning based multiple RNA methylation sites prediction across species

Key Points

Key points are not available for this paper at this time.

Abstract

Methylation of ribonucleic acid (RNA) is an essential post-transcriptional alteration that has a major effect on many biological processes. Identifying RNA methylation sites is essential for understanding gene regulation and potential therapeutic targets. The contribution of this study is multi-folded. Firstly, this study introduces two novel deep learning models for predicting RNA methylation sites: Convolutional Neural Network (CNN)-based and transformer-based models. These models are trained and evaluated on human and mouse benchmark datasets for m1A, m6A, m5C, and A to I, methylation types. Secondly, this work investigates the effect of different encoding techniques on model performance, including one-hot encoding, Gene2Vec, and position encoding, as well as their combinations using concatenation, summation, and multiplication. Thirdly, this study also aims to investigate the prediction strength of motif-based and attention-based classifiers. The obtained results demonstrate that both models achieve high accuracy in predicting RNA methylation sites, outperforming existing state-of-the-art approaches in terms of multiple performance metrics. Moreover, the selection of encoding strategy has a substantial impact on prediction accuracy; the best approaches vary based on the particular species and type of methylation. The findings also indicate that the motif-based classifier is more stable than the attention-based classification when predicting RNA methylation. In the future, we aim to expand our research beyond human and mouse models to explore RNA methylation in plants. • This study introduces two DL models for predicting RNA methylation sites (m1A, m6A, m5C, and A to I editing) for human and mouse: a Convolutional Neural Network (CNN)-based and a transformer-based model. • Three primary encoding methods were used: one-hot encoding, Gene2Vec, and position encoding and were fused using concatenation, summation, and multiplication to explore their combined effects. • The results indicate that the motif-based classifier (CNN) is more stable across different encoding techniques compared to the attention-based classifier (transformer). • The study presented SOTA results. • The attention-based model generally exhibited better FLOPs compared to the CNN-based model. • Paired t-tests were conducted to assess the significant difference between the two proposed models.

Read Full Paperexternally

Ask AI

Helpful

Bookmark

View Full Paper