Identifying common patterns of polymorphisms across SARS-CoV-2 variants is an essential part of tracking and preventing future pandemics. By examining the large-scale global sequencing effort of SARS-CoV-2, we find that polymorphisms in SARS-CoV-2 are context-dependent and significantly correlated across different variants, with neighboring nucleotides capable of altering polymorphism frequency by an average of 72-fold. Incorporating context-dependent patterns into evolutionary simulations improves the ability to predict polymorphisms in SARS-CoV-2 by 94%, and reveal relatively immutable regions in NSP3, NSP13, and the spike protein that are potential targets for gene therapy. Subdividing SARS-CoV-2 into Persistent and Transient variants reveal that Persistent variants carry an excess of unique polymorphisms in the hand domain of the RNA-dependent RNA polymerase (p = 0.001). Overall, our work highlights the importance of context-dependent polymorphisms in the evolution of the SARS-CoV-2 genome, associates genetic signatures with variant persistence, and identifies static regions and motifs that can be used to design long-lasting antivirals that rely on sequence specificity.
Caraway et al. (Sun,) studied this question.