What does this research mean for the field?

The 'dark genome' is a significant source of targetable, immunogenic cancer neoepitopes that can be systematically identified at scale using the PEPMatch high-throughput peptide search tool integrated with comprehensive sequence databases. Novelty: ClaimNovelty.METHODOLOGICAL. Consensus alignment: ConsensusAlignment.ESTABLISHES_NEW_DIRECTION.

What question did this study set out to answer?

To systematically identify and validate noncanonical cancer neoepitopes originating from the dark genome.

April 5, 2026

Abstract 47: Identifying noncanonical sources of cancer neoepitopes using PEPMatch and CEDAR.

Key Points

To systematically identify and validate noncanonical cancer neoepitopes originating from the dark genome.
Utilized PEPMatch integrated with the Cancer Epitope Database and Analysis Resource (CEDAR).
Compiled databases containing both canonical and noncanonical protein sources.
Performed searches on 28,601 neoepitopes tested in 40,800 T cell assays.
Analyzed MHC class I cryptic neoepitopes from mass spectrometry for mapping success.
Achieved a mapping success of 98.33% for cryptic neoepitopes.
1.91% of all CEDAR neoepitopes originated from noncanonical sources.
Significant mapping of positive immunogenic neoepitopes to noncanonical databases compared to negative peptides.

Abstract

Abstract Emerging evidence suggests that immunogenic cancer antigens may originate from the "dark genome", which comprises non-coding regions, alternative open reading frames (ORF), and untranslated regions (UTR), traditionally excluded from most neoepitope prediction pipelines. Consequently, systematic identification and validation of these noncanonical antigens has been technically challenging. To address this gap, we leveraged PEPMatch, a high-throughput peptide search tool, integrated with the Cancer Epitope Database and Analysis Resource (CEDAR) to systematically identify noncanonical sources of experimentally validated neoepitopes. We compiled seven comprehensive human databases encompassing both canonical and noncanonical protein sources: UniProtKB (human reference proteome with isoforms), UniParc (UniProt protein archive), Ensembl canonical proteins (ENSP), validated non-canonical ORFs (ncORF), and three-frame translations of complementary DNAs (cDNA) and non-coding RNAs (ncRNA), and six-frame translations of full gene sequences, including UTRs and introns (ENSG). Using PEPMatch's exact matching algorithm, we performed searches of 28,601 CEDAR neopeptides tested in 40,800 T cell assays against all seven databases. To validate our pipeline, we first analyzed 840 MHC class I cryptic neopeptides found by mass spectrometry, achieving a total mapping success of 98.33% using a sequential attribution search that revealed an enrichment in noncanonical sources (ncORF: 14.76%, cDNA: 41.55%, ncRNA: 4.76%, ENSG: 1.67%). Out of these cryptic neoepitopes, 5.60% were found in the human reference proteome and 28.93% in the UniParc database prior to the noncanonical sources. Applying this methodology to all 28,601 CEDAR's neopeptides revealed that 1.91% originated from noncanonical sources. We then focused on the relevant subset of 6,394 positive neoepitopes with confirmed immunogenicity from T cell assays. Notably, a significant percentage of these immunogenic positive neoepitopes mapped to noncanonical databases compared with the negative peptides (ncORF: 0.1% vs 0.0%, cDNA: 8.8% vs 2.6%, ncRNA: 1.4% vs 0.6%, ENSG: 3.4% vs 1.5%), suggesting that the 'dark genome' is a source of targetable neoepitopes. The specificity of this mapping approach was validated using randomly shuffled peptide controls, which yielded 1% spurious matches. These findings demonstrate that PEPMatch successfully identifies noncanonical genomic origins of cancer neoepitopes at scale given the appropriately curated database sources. CEDAR is implementing this annotation pipeline to provide researchers with mappings for dark genome antigens, enabling validation and discovery of unconventional targets for immunotherapy development. This work expands the targetable landscape of cancer immunotherapy by systematically cataloging epitopes from previously overlooked genomic regions. Citation Format: Daniel Marrama, Ibel Carri, Nina Blazeska, Randi Vita, Hannah K. Carter, Morten Nielsen, Alessandro Sette, Zeynep Kosaloglu-Yalcin, Bjoern Peters. Identifying noncanonical sources of cancer neoepitopes using PEPMatch and CEDAR abstract. In: Proceedings of the American Association for Cancer Research Annual Meeting 2026; Part 1 (Regular Abstracts); 2026 Apr 17-22; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2026;86(7 Suppl):Abstract nr 47.

Bookmark

Abstract 47: Identifying noncanonical sources of cancer neoepitopes using PEPMatch and CEDAR.

Key Points

Abstract

Cite This Study

Also Consider

Also Consider