September 25, 2024Open Access

Fine-tuning sequence-to-expression models on personal genome and transcriptome data

RRRuchir RastogiBerkeley College ARAniketh Janardhan ReddyIllumina (United States)RCRyan ChungScripps Institution of Oceanography

Key Points

Key points are not available for this paper at this time.

Abstract

Genomic sequence-to-expression deep learning models, which are trained to predict gene expression and other molecular phenotypes across the reference genome, have recently been shown to have poor out-of-the-box performance in predicting gene expression variation across individuals based on their personal genome sequences. Here we explore whether additional training (fine-tuning) on paired personal genome and transcriptome data improves the performance of such sequence-to-expression models. Using Enformer as a representative pre-trained model, we explore various fine-tuning strategies. Our results show that fine-tuning improves cross-individual prediction performance over the baseline Enformer model for held-out individuals on genes seen during fine-tuning, with comparable performance to variant-based linear models commonly used in transcriptome-wide association studies. However, fine-tuning does not improve model generalizability on held-out genes, which contain sequences and variants unseen during fine-tuning, highlighting a remaining open challenge in the field.

Demander à l'IA

Bookmark

View Full Paper