Key points are not available for this paper at this time.
The AlphaFold Protein Structure Database, containing predictions for over 200 million proteins, has been met with enthusiasm over its potential in enriching structural biological research and beyond. Currently, access to the database is precluded by an urgent need for tools that allow the efficient traversal, discovery, and documentation of its contents. Identifying domain regions in the database is a non-trivial endeavour and doing so will aid our understanding of protein structure and function, while facilitating drug discovery and comparative genomics. Here, we describe a deep learning method for domain segmentation called Merizo, which learns to cluster residues into domains in a bottom-up manner. Merizo is trained on CATH domains and fine-tuned on AlphaFold2 models via self-distillation, enabling it to be applied to both experimental and AlphaFold2 models. As proof of concept, we apply Merizo to the human proteome, identifying 40,818 putative domains that can be matched to CATH representative domains.
Building similarity graph...
Analyzing shared references across papers
Loading...
Andy M. Lau
Shaun M. Kandathil
David T. Jones
Nature Communications
University College London
Building similarity graph...
Analyzing shared references across papers
Loading...
Lau et al. (Tue,) studied this question.
www.synapsesocial.com/papers/69ff7e42b124fe581985779c — DOI: https://doi.org/10.1038/s41467-023-43934-4