Metazoan transcriptional regulation is mediated by modular DNA segments termed cis-regulatory modules (CRMs). The precise identification of CRM target genes is essential for understanding cellular functions and disease mechanisms. While reporter assays can validate CRM targets, selecting candidates remains challenging because CRMs often act over long distances, bypassing nearby genes. Simple nearest-gene assignments are therefore unreliable, and methods based on epigenetic correlations are still limited. Enhancer-promoter interaction (EPI) prediction has improved CRM target identification. However, lots of existing EPI tools suffer from overfitting and overlook alternative mechanisms such as non-coding RNAs (ncRNAs) transcribed from CRMs. To address these gaps, we developed CRM-Target Identifier (CRM-TI), a deep learning pipeline that integrates mechanisms of chromatin interactions and CRM ncRNA-transcription-related gene regulation. Using a chromosome-wise partitioning scheme, CRM-TI achieved an unbiased test auROC of 90.0%, outperforming existing methods by at least 10%. Additionally, considering both CRM-gene interactions and ncRNA-transcription-related regulation further improved auROC performance by 21.9% over the model that considers only CRM-gene interactions. These results demonstrate that combining comprehensive transcriptional regulatory mechanisms on CRMs provides a more complete framework for CRM-target gene assignment. CRM-TI is available at https://github.com/cobisLab/CRM-TI.
Yu et al. (Tue,) studied this question.