Abstract Emerging evidence suggests that immunogenic cancer antigens may originate from the "dark genome", which comprises non-coding regions, alternative open reading frames (ORF), and untranslated regions (UTR), traditionally excluded from most neoepitope prediction pipelines. Consequently, systematic identification and validation of these noncanonical antigens has been technically challenging. To address this gap, we leveraged PEPMatch, a high-throughput peptide search tool, integrated with the Cancer Epitope Database and Analysis Resource (CEDAR) to systematically identify noncanonical sources of experimentally validated neoepitopes. We compiled seven comprehensive human databases encompassing both canonical and noncanonical protein sources: UniProtKB (human reference proteome with isoforms), UniParc (UniProt protein archive), Ensembl canonical proteins (ENSP), validated non-canonical ORFs (ncORF), and three-frame translations of complementary DNAs (cDNA) and non-coding RNAs (ncRNA), and six-frame translations of full gene sequences, including UTRs and introns (ENSG). Using PEPMatch's exact matching algorithm, we performed searches of 28,601 CEDAR neopeptides tested in 40,800 T cell assays against all seven databases. To validate our pipeline, we first analyzed 840 MHC class I cryptic neopeptides found by mass spectrometry, achieving a total mapping success of 98.33% using a sequential attribution search that revealed an enrichment in noncanonical sources (ncORF: 14.76%, cDNA: 41.55%, ncRNA: 4.76%, ENSG: 1.67%). Out of these cryptic neoepitopes, 5.60% were found in the human reference proteome and 28.93% in the UniParc database prior to the noncanonical sources. Applying this methodology to all 28,601 CEDAR's neopeptides revealed that 1.91% originated from noncanonical sources. We then focused on the relevant subset of 6,394 positive neoepitopes with confirmed immunogenicity from T cell assays. Notably, a significant percentage of these immunogenic positive neoepitopes mapped to noncanonical databases compared with the negative peptides (ncORF: 0.1% vs 0.0%, cDNA: 8.8% vs 2.6%, ncRNA: 1.4% vs 0.6%, ENSG: 3.4% vs 1.5%), suggesting that the 'dark genome' is a source of targetable neoepitopes. The specificity of this mapping approach was validated using randomly shuffled peptide controls, which yielded 1% spurious matches. These findings demonstrate that PEPMatch successfully identifies noncanonical genomic origins of cancer neoepitopes at scale given the appropriately curated database sources. CEDAR is implementing this annotation pipeline to provide researchers with mappings for dark genome antigens, enabling validation and discovery of unconventional targets for immunotherapy development. This work expands the targetable landscape of cancer immunotherapy by systematically cataloging epitopes from previously overlooked genomic regions. Citation Format: Daniel Marrama, Ibel Carri, Nina Blazeska, Randi Vita, Hannah K. Carter, Morten Nielsen, Alessandro Sette, Zeynep Kosaloglu-Yalcin, Bjoern Peters. Identifying noncanonical sources of cancer neoepitopes using PEPMatch and CEDAR abstract. In: Proceedings of the American Association for Cancer Research Annual Meeting 2026; Part 1 (Regular Abstracts); 2026 Apr 17-22; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2026;86(7 Suppl):Abstract nr 47.
Marrama et al. (Fri,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: