Key points are not available for this paper at this time.
Methylation of ribonucleic acid (RNA) is an essential post-transcriptional alteration that has a major effect on many biological processes. Identifying RNA methylation sites is essential for understanding gene regulation and potential therapeutic targets. The contribution of this study is multi-folded. Firstly, this study introduces two novel deep learning models for predicting RNA methylation sites: Convolutional Neural Network (CNN)-based and transformer-based models. These models are trained and evaluated on human and mouse benchmark datasets for m1A, m6A, m5C, and A to I, methylation types. Secondly, this work investigates the effect of different encoding techniques on model performance, including one-hot encoding, Gene2Vec, and position encoding, as well as their combinations using concatenation, summation, and multiplication. Thirdly, this study also aims to investigate the prediction strength of motif-based and attention-based classifiers. The obtained results demonstrate that both models achieve high accuracy in predicting RNA methylation sites, outperforming existing state-of-the-art approaches in terms of multiple performance metrics. Moreover, the selection of encoding strategy has a substantial impact on prediction accuracy; the best approaches vary based on the particular species and type of methylation. The findings also indicate that the motif-based classifier is more stable than the attention-based classification when predicting RNA methylation. In the future, we aim to expand our research beyond human and mouse models to explore RNA methylation in plants. • This study introduces two DL models for predicting RNA methylation sites (m1A, m6A, m5C, and A to I editing) for human and mouse: a Convolutional Neural Network (CNN)-based and a transformer-based model. • Three primary encoding methods were used: one-hot encoding, Gene2Vec, and position encoding and were fused using concatenation, summation, and multiplication to explore their combined effects. • The results indicate that the motif-based classifier (CNN) is more stable across different encoding techniques compared to the attention-based classifier (transformer). • The study presented SOTA results. • The attention-based model generally exhibited better FLOPs compared to the CNN-based model. • Paired t-tests were conducted to assess the significant difference between the two proposed models.
Shah et al. (Thu,) studied this question.