Tandem mass spectrometry (MS/MS) is a core technology for small molecule structural elucidation in nontargeted metabolomics. However, the limited coverage of experimental spectral libraries presents a major challenge to MS-based small molecule identification. Although spectrum prediction methods offer a promising alternative, existing approaches often suffer from limitations in combinatorial enumeration strategies or insufficient molecular representation capabilities, leading to poor generalizability. To address this, we propose HDSE-MS, an MS/MS spectrum prediction model that integrates a message passing neural network (MPNN) with a Transformer architecture enhanced by hierarchical distance structural encoding (HDSE). By applying graph coarsening, the model transforms molecular graphs into multilevel cluster structures and encodes the hierarchical distances between clusters as structural biases in the Transformer. This enables the joint modeling of molecular substructures, their hierarchical relationships, and long-range dependencies, thereby improving the model's ability to represent complex molecular structures. We conducted a comprehensive evaluation of HDSE-MS on three benchmark data sets: NIST23, MoNA, and MassBank. Experimental results show that HDSE-MS outperforms existing methods, achieving mean spectral entropy similarities of 0.759, 0.567, and 0.483 under the M + H+ ionization mode, respectively. Furthermore, on the external CASMI2022 test set, HDSE-MS achieved a Rank of 220.8 and a Top-1 accuracy of 0.098, demonstrating its strong predictive accuracy, robust generalization, and scalability. The source code is publicly available at https://github.com/lzjforyou/HDSE-MS, and an interactive web service is accessible at https://huggingface.co/spaces/liuzhijin/hdse-ms-attn-viz.
Liu et al. (Tue,) studied this question.