Key points are not available for this paper at this time.
The artificial intelligence (AI)-driven generation of genetic sequences holds transformative potential for addressing global challenges in agriculture, medicine, and bioenergy. Traditional approaches including hybridization, mutagenesis, and CRISPR-based editing enable targeted modification of endogenous DNA, yet remain constrained by natural sequence diversity. We here introduce PlantGFM, an application of the Hyena operator within a plant-oriented genomic foundation model, which was pre-trained on 10.84 billion nucleotides from 12 plant species and supports long-context (64 kb) prediction and sequence generation within a unified architecture. After fine-tuning on 10 annotated plant genomes, PlantGFM matched or exceeded the performance of specialized gene prediction tools. Beyond reproducing natural genes, it enables de novo design of novel candidates through the emergence capability of AI. Seven candidates selected through an AI-Human Knowledge fusion screening pipeline all showed transcriptional activity in Nicotiana benthamiana, two with stable protein expression-representing the first demonstration of DNA-RNA-protein expression of Large Language Model-generated sequences in plants. As a proof of concept, PlantGFM also exhibits emergent abilities in generating plant NLR genes. Our findings establish the feasibility of LLM technology for de novo plant gene design, providing a foundation for plant synthetic biology and AI-assisted breeding.
Li et al. (Wed,) studied this question.