December 29, 2023Open Access

Embedding-based alignment: combining protein language models with dynamic programming alignment to detect structural similarities in the twilight-zone

Key Points

Key points are not available for this paper at this time.

Abstract

MOTIVATION: Language models are routinely used for text classification and generative tasks. Recently, the same architectures were applied to protein sequences, unlocking powerful new approaches in the bioinformatics field. Protein language models (pLMs) generate high-dimensional embeddings on a per-residue level and encode a "semantic meaning" of each individual amino acid in the context of the full protein sequence. These representations have been used as a starting point for downstream learning tasks and, more recently, for identifying distant homologous relationships between proteins. RESULTS: In this work, we introduce a new method that generates embedding-based protein sequence alignments (EBA) and show how these capture structural similarities even in the twilight zone, outperforming both classical methods as well as other approaches based on pLMs. The method shows excellent accuracy despite the absence of training and parameter optimization. We demonstrate that the combination of pLMs with alignment methods is a valuable approach for the detection of relationships between proteins in the twilight-zone. AVAILABILITY AND IMPLEMENTATION: The code to run EBA and reproduce the analysis described in this article is available at: https: //git. scicore. unibas. ch/schwede/EBA and https: //git. scicore. unibas. ch/schwede/ebabenchmark.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Lorenzo Pantolini

Gabriel Studer

Joana Pereira

Journals

Bioinformatics

Actions

Institutions

University of Basel

SIB Swiss Institute of Bioinformatics

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Embedding-based alignment: combining protein language models with dynamic programming alignment to detect structural similarities in the twilight-zone

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study