What question did this study set out to answer?

The aim is to evaluate the effectiveness of the Embedding-Based Alignment method for detecting remote homologs in multifunctional human proteins.

January 22, 2026Open Access

Embedding-Based Alignments Capture Structural and Sequence Domains of Distantly Related Multifunctional Human Proteins

Key Points

The aim is to evaluate the effectiveness of the Embedding-Based Alignment method for detecting remote homologs in multifunctional human proteins.
Utilized Embedding-Based Alignment (EBA) for pairwise protein comparisons.
Analyzed a set of randomly selected multifunctional human proteins.
Applied clustering procedures for validation against the Swiss-Prot database.
Considered protein sequence length similarity as a constraint for homolog detection.
EBA successfully retrieved remote homologs with similar structural and functional features.
Validated findings through rigorous checks against the Swiss-Prot database.
Demonstrated that multifunctional proteins constitute a significant part of the human reference proteome.

Abstract

Protein embedding is a protein representation that carries along the information derived from filtering large volumes of sequences stored in large archives. Routinely, the protein is represented by a matrix in which each residue is a context-specific vector whose dimensions reflect the size of the large architectures of neural networks (transformers) trained with deep learning algorithms on large volumes of sequences. A recently introduced method (Embedding-Based Alignment, EBA) is particularly suited for pairwise embedding comparisons and, as we report here, allows for remote homolog detection under specific constraints, including protein sequence length similarity. Multifunctional proteins are present in different species. However, particularly in humans, the problem of their structural and functional annotation is urgent since, according to recent statistics, they comprise up to 50% of the human reference proteome. In this paper we show that when EBA is applied to a set of randomly selected multifunctional human proteins, it retrieves, after a clustering procedure and rigorous validation on the reference Swiss-Prot database, proteins that are remote homologs to each other and carry similar structural and functional features as the query protein.

Embedding-Based Alignments Capture Structural and Sequence Domains of Distantly Related Multifunctional Human Proteins

Key Points

Abstract

Cite This Study