What question did this study set out to answer?

The research aims to improve predictions of protein variant effects related to SARS-CoV-2 by using machine learning. It seeks to model the RBD fitness landscape to understand evolutionary patterns.

May 8, 2026Open Access

Machine Learning-Driven Simulations of the SARS-CoV-2 Fitness Landscape from Deep Mutational Scanning Experiments

Key Points

The research aims to improve predictions of protein variant effects related to SARS-CoV-2 by using machine learning. It seeks to model the RBD fitness landscape to understand evolutionary patterns.
Trained machine learning models using DMS libraries of SARS-CoV-2 RBD sequences labeled with ACE2 binding affinity.
Applied Markov Chain Monte Carlo simulations to characterize the RBD fitness landscape.
Compared machine learning predictions with traditional methods of averaging point mutation effects.
Machine learning models outperformed traditional methods in predicting combinatorial mutation effects.
Predicted fitness landscape aligns closely with high-fitness sequences from DMS data.
Successfully ranked omicron variants based on training solely on wild type variants.

Abstract

Predicting protein variant effects is a key challenge in preparing for pathogenic viral strains, understanding mutation-linked diseases, and designing new proteins. Protein sequence-structure-function relationships are difficult to model due to complex allosteric and epistatic effects. To investigate efficient modeling strategies, we trained supervised machine learning (ML) models with deep mutational scanning (DMS) libraries of SARS-CoV-2 receptor binding domain (RBD) sequences labeled with angiotensin converting enzyme 2 (ACE2) binding affinity. These models demonstrate superior performance predicting combinatorial mutation effects compared to adding or averaging the effects of point mutations and exhibit strong extrapolative performance ranking omicron variants when training only near wild type (WT) variants. We characterize the RBD fitness landscape by combining ML with Markov Chain Monte Carlo simulations to predict evolutionary patterns from the WT sequence. These generate comparable sequence profiles to high-fitness sequences in DMS data and predict mutations in unseen omicron variants. These models provide insight into the relationship between RBD sequence elements and offer a new perspective on the use of DMS to predict emerging viral strains, which we anticipate will be applicable to other evolutionary prediction tasks. To facilitate application and future development of this strategy, we introduce Mavenets: https://github.com/SztainLab/mavenets.

Machine Learning-Driven Simulations of the SARS-CoV-2 Fitness Landscape from Deep Mutational Scanning Experiments

Key Points

Abstract

Cite This Study