Proteogenomics is a transformative approach for deciphering novel coding regions through integration of genomic, transcriptomic, and proteomic data. Here, we present pAnno, an end-to-end workflow designed to uncover hidden protein-coding elements with high precision and efficiency. pAnno generates customized protein databases by integrating multi-omic data, employs a multi-stage iterative open search strategy, and incorporates an efficient peptide-to-coding sequence mapping algorithm. Despite a 50-fold increase in database size, pAnno maintains high sensitivity and accuracy in peptide identification and achieves genomic localization of novel events with only \ (\) 3% additional processing time, delivering unprecedented resolution and speed in proteogenomic analysis. By detecting splicing, mutations, and novel protein isoforms, pAnno supports various downstream applications and reveals overlooked events, identifying 1. 73 \ (\) more novel proteins in Pyrus and 34 \ (\) more non-canonical HLA peptides in lung cancer. These capabilities position pAnno as a gold-standard proteogenomic workflow, excelling in non-canonical coding discovery and large-scale database processing.
Wang et al. (Tue,) studied this question.