Metabolite annotation, especially the discovery of unknown metabolites, remains a fundamental challenge in mass spectrometry-based untargeted metabolomics due to limited reference mass spectra. Here we present MetGenX, a structure-informed encoder-decoder neural network that enables efficient and controllable generation of metabolite structures directly from MS2 spectra. By reformulating the spectrum-to-structure task as a structure-to-structure generation problem, MetGenX significantly improves generation accuracy and chemical space coverage. In independent tests, it achieved top-1 accuracy of 55.9% on 1388 NIST MS2 spectra and 68.5% on 1681 spectra from real biological samples, outperforming existing in silico tools. Its structure-informed design ensures robust performance across both positive and negative ionization modes without retraining. Applying a multi-step annotation workflow to mouse liver untargeted metabolomics data, MetGenX identified two previously uncharacterized metabolites absent from major human metabolome databases. These results demonstrate MetGenX’s strong potential to advance de novo metabolite annotation and facilitate the discovery of uncharacterized chemical entities. Metabolite annotation remains challenging in untargeted metabolomics. Here, the authors present MetGenX, a structure-informed deep generative model that generates structures from MS2 spectra to improve metabolite annotation and enable discovery of chemical entities.
Wang et al. (Mon,) studied this question.