Abstract Predicting the interactions between piRNA sequences and mRNA sequences is central to understanding post-transcriptional regulation in the germline and to the design of perturbations that modulate PIWI (P-element-induced wimpy testis) guided silencing. Any improvement in the accuracy of predictions about RNA sequence interactions is vital to the enhancements that can be made in many critical medical fields. This paper proposes rbpCNN, a lightweight convolutional neural network (CNN) that augments nucleotide-pair encoding with biophysically motivated interaction channels prior to learning. By adding one compatibility channel, two helix-run channels, one positional channel, and one structural channel, we aimed to support the predictions of the CNN layer and improve prediction accuracy. The resulting network is lightweight and achieves strong performance relative to existing solutions on this benchmark. Experimental results showed that rbpCNN achieves an AUC (area under the receiver operating characteristic curve) of 96.55% and an accuracy of 90.66% in fivefold validation, and an AUC of 94.19% and an accuracy of 86.74% on a separate, fully independent external dataset, with performance competitive with and, in several metrics, exceeding previously reported results on the same benchmark.
Gürhanlı et al. (Mon,) studied this question.