Abstract Modern AI (Artificial Intelligence) methods offer new opportunities in pharmacology by enabling improved modeling of disease mechanisms and drug action learned from large and heterogeneous biological datasets. A central challenge is developing models that can jointly integrate disparate biomedical modalities. We introduce MAMMAL ( M olecular A ligned M ulti M odal A rchitecture and L anguage), a foundation model for cross-modal learning, designed to address the challenges associated with drug discovery tasks. MAMMAL was pre-trained on 2 billion samples across protein and antibody sequences, small molecules, and gene expression profiles, and supports classification, regression, and generative tasks on cross-modal inputs. Across eleven benchmarks covering multiple stages of the drug discovery pipeline, MAMMAL achieves state-of-the-art performance on nine tasks and competitive results on two. In an antibody-antigen binding benchmark, fine-tuned MAMMAL prediction scores significantly outperform AlphaFold3 confidence scores, used here as a reference proxy for binding likelihood, in five of seven antigen targets. The MAMMAL framework and pretrained models are publicly available to support open and collaborative research.
Shoshan et al. (Mon,) studied this question.