In a living cell, DNA replication begins at multiple genomic sites, called replicationorigins. Identifying these origins and their underlying base sequence compositionis crucial for understanding replication process. Existing machine learningmethods for origin prediction often require labor-intensive feature engineering orlack interpretability. Here, we employ DNABERT to predict yeast replication originsand uncover sequence motifs by combining attention maps with MEME, aclassical bioinformatics tool. Our approach eliminates manual feature extractionand identifies biologically relevant motifs across datasets of varying complexity.This work advances interpretable machine learning in genomics, offering a potentiallygeneralizable framework for origin prediction and motif discovery.
Piroozeh et al. (Wed,) studied this question.