• We present data augmentation techniques for genetic sequences and secondary structures, that were previously only applied to genetic sequences. • We leverage these data augmentation techniques to improve our RNA secondary structure prediction tool DivideFold into a stronger version: DivideFold+. • We make a web server for DivideFold+ publicly accessible at https://evryrna.ibisc.univ-evry.fr/DivideFold/ to allow an enable researchers to easily use it. For more significant workloads, DivideFold+ can still be used locally. • We include graphical representations of the DivideFold+ prediction in the web server, in the form of a dot-bracket representation of the predicted secondary structure and a 2D visualization. • DivideFold+ provides not only a predicted secondary structure for an RNA sequence, but also a predicted partition of the structure into different substructures that could be good candidates for functional subdomains. In the web server, the predicted dot-bracket and 2D visualization include a color scheme to easily visualize the subdomains. Predicting the secondary structure of RNAs, particularly long RNAs, remains a challenging problem despite its importance in identifying the structural roles of RNAs. Deep-learning-based methods face a lack of data and cannot provide very accurate predictions for long RNAs. To overcome this difficulty, we presented a method called DivideFold in a previous work, which divides long RNAs into structurally independent, shorter fragments. This approach enables the overall secondary structure of the RNA to be inferred by predicting the secondary structure of each fragment, a much easier task. We present here an enhanced version called DivideFold+ that improves upon several aspects. Since our method is a deep-learning-based one, we introduce a new data augmentation strategy specifically designed for RNA secondary structure predictions, which is more elaborate than the traditional ones used in the literature. The computational results we obtain show the benefit of such a strategy. Besides the secondary structure prediction we obtain, DivideFold+ provides a segmentation of the secondary structure into subdomains, each subdomain corresponding to a fragment. These subdomains can serve, in a similar way to proteins, as potential candidates for functional domains in RNAs. Finally, We provide a user-friendly web server that allows visualization of the predicted secondary structure, as well as the different subdomains. DivideFold+, along with all the datasets used for this study, is publicly accessible on the EvryRNA platform at https://evryrna.ibisc.univ-evry.fr/.
Omnes et al. (Fri,) studied this question.