Since the advance of methods based on artificial intelligence, predicting the 3D structure for the majority of proteins is no longer a primary issue. Despite these spectacular successes, significant challenges remain. We focus on two of these major issues. The first is the incorporation of the temporal dimension, the 4D property. This is an intense field of research, and our work focuses specifically on flexibility analysis and prediction. The second is the high computational cost, both in model size and calculation time, associated with protein language models (pLMs).To address protein dynamics, recent efforts have focused on leveraging large-scale, standardized molecular dynamics (MD) simulation data, such as that provided by our ATLAS database. While approximations using the pLDDT confidence score from prediction tools have been proposed to estimate flexibility, large-scale studies show its correlation with MD-derived flexibility is limited and performs poorly for proteins with interacting partners. To overcome this, our new deep learning model, PEGASUS, trained directly on MD data from ATLAS, can now accurately predict multiple flexibility metrics (e.g., RMSF) from sequence alone, offering a more reliable view of protein dynamics. To tackle the computational burden of pLMs, we also explored the use of an adversarial autoencoder (AAE). This approach effectively compresses the high-dimensional embeddings generated by pLMs into a low-dimensional, continuous, and structured latent space. This compression significantly reduces storage and computational costs for downstream tasks. Furthermore, the resulting organized latent space enables efficient, large-scale structural similarity searches and opens promising avenues for generative modeling, allowing for the exploration and design of novel protein sequences by navigating this compressed representation. In conclusion, these strategies for predicting dynamics and efficiently representing sequence information address post-AlphaFold challenges in computational biophysics
Jean‐Christophe Gelly (Sun,) studied this question.