Abstract Background: The identification of immunogenic cancer epitopes is the foundation for advancing effective cancer immunotherapies. To provide the community with a centralized data source, the Cancer Epitope Database and Analysis Resource (CEDAR, cedar.iedb.org) maintains the most extensive and actively updated collection of experimentally validated cancer epitope data. Currently, CEDAR hosts data from over 6,240 peer-reviewed publications, encompassing T cell, B cell, and MHC ligand assays. While traditional neoepitope discovery has focused on the canonical exome, there is a rapidly growing body of evidence indicating that the "dark genome", comprising non-coding regions, alternative open reading frames (ORFs), and untranslated regions (UTRs), is a significant source of targetable neoantigens. Methods: To accurately curate and incorporate this expanding source of immunological data into CEDAR, we leveraged PEPMatch, a high-throughput peptide search tool. We compiled seven comprehensive human databases including UniProtKB, UniParc, Ensembl canonical proteins (ENSP), validated non-canonical ORFs (ncORF), three-frame translations of complementary DNAs (cDNA) and non-coding RNAs (ncRNA), and six-frame translations of full gene sequences, including UTRs and introns (ENSG). Using PEPMatch's matching algorithm, we performed searches of CEDAR neopeptides against these databases to identify their genomic origins. Results: To validate this pipeline, we first analyzed 840 MHC class I cryptic neopeptides found by mass spectrometry, achieving a total mapping success of 98.33% using a sequential attribution search that revealed an enrichment in noncanonical sources (ncORF: 14.76%, cDNA: 41.55%, ncRNA: 4.76%, ENSG: 1.67%). Applying this methodology to all 28,601 CEDAR's neopeptides revealed that 1.91% originated from noncanonical sources. We then focused on the relevant subset of 6,394 positive neoepitopes with confirmed immunogenicity from T cell assays. Notably, a significant percentage of these immunogenic positive neoepitopes mapped to noncanonical databases compared with the negative peptides (ncORF: 0.1% vs 0.0%, cDNA: 8.8% vs 2.6%, ncRNA: 1.4% vs 0.6%, ENSG: 3.4% vs 1.5%), suggesting that the 'dark genome' is a source of immunogenic neoepitopes. The specificity of this mapping approach was validated using randomly shuffled peptide controls, which yielded 1% spurious matches. Conclusions: These findings demonstrate that PEPMatch successfully identifies noncanonical genomic origins of cancer neoepitopes at scale given the appropriately curated database sources. CEDAR is implementing this annotation pipeline to provide researchers with mappings for dark genome antigens. This work expands the targetable landscape of cancer immunotherapy by cataloging epitopes from previously overlooked genomic regions, enabling the discovery of unconventional targets for the next generation of personalized cancer vaccines and TCR-based therapies. Citation Format: Ibel Carri, Daniel Marrama, Nina Blazeska, Randi Vita, Hannah K. Carter, Morten Nielsen, Alessandro Sette, Zeynep Koşaloğlu-Yalçın, Bjoern Peters. Systematic identification of noncanonical neoantigens from the dark genome in CEDAR using PEPMatch abstract. In: Proceedings of the AACR Immuno-Oncology Conference (AACR IO): Discovery and Innovation in Cancer Immunology: Revolutionizing Treatment through Immunotherapy; 2026 Feb 18-21; Los Angeles, CA. Philadelphia (PA): AACR; Cancer Immunol Res 2026;14(2 Suppl):Abstract nr LB-B007.
Carri et al. (Thu,) studied this question.