Abstract Neoantigen discovery is essential for personalized immunotherapy, but current approaches are limited by a focus on small somatic variants identifiable by short-read sequencing. These variants often produce peptides that resemble self-antigens and are weakly immunogenic. Long-read sequencing enables more sensitive detection of large structural variants and full-length transcripts. Large mutations can generate neoantigens that are more dissimilar to self and thus more immunogenic. However, no method exists to identify a comprehensive set of somatic DNA and RNA variants, integrate these, and contextualize each amino acid using long-read data. To address this gap, we developed Exacto, a publicly available software program that uses long-read sequencing data to accurately characterize tumor genomes, transcriptomes, and mutant proteoforms. Exacto performs three main functions. First, it profiles reference-genome aligned long reads to identify major types of tumor-specific DNA variants (SNV, multi-nucleotide variant, insertion, deletion, duplication, inversion, and translocation) and RNA variants (SNV, multi-nucleotide variant, insertion, deletion, cryptic exon, intron retention, exon skipping / truncation, fusion gene, circular RNA, and unannotated intergenic isoforms). Second, it integrates RNA and DNA variants to predict the splicing consequences of somatic mutations. Third, it translates full-length RNA sequences and annotates each amino acid with underlying RNA and DNA variants. We have also developed a genome and transcriptome variation graph builder in Exacto to generate synthetic tumor and matched normal genomes as well as tumor transcriptomes. To perform a comprehensive benchmark study for mutant peptide identification, we developed VSTOL and Nexus. VSTOL introduces a new framework, called Occam’s Variant Grammar, to characterize DNA and RNA variants in a unified representation for existing variant callers. Nexus is a Nextflow suite that runs over 50 tools for neoantigen discovery. Using synthetic samples generated from the variation graphs, we benchmarked somatic DNA variant calling with Exacto, ClairS, Nanomonsv, Savana, Severus, and SVision-pro. Exacto achieved the highest or tied-highest recall for all simulated variant types (SNV, deletion, insertion = 1.000; translocation and inversion = 0.983). It outperformed the next-best tools by substantial margins: for translocations and inversions, Exacto achieved 0.688 precision versus 0.205 for Savana (both with 0.983 recall); for deletions, 0.945 (Exacto) precision compared to 0.761 (Savana) with both methods obtaining 1.000 recall; and for insertions, 1.000 (Exacto) recall versus 0.901 (Nanomonsv) with both delivering 1.000 precision. Given this performance, we expect Exacto will become the standard method for discovery of immunogenic neoantigens using long-read DNA and RNA sequencing. Citation Format: Jin Seok Lee, Maria J. Sambade, Jeremy Wang, Alex Rubinsteyn, Benjamin G. Vincent. Exacto: Accurate identification of mutant proteoforms and neoantigens using integrative long-read sequencing abstract. In: Proceedings of the American Association for Cancer Research Annual Meeting 2026; Part 1 (Regular Abstracts); 2026 Apr 17-22; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2026;86(7 Suppl):Abstract nr 5511.
Lee et al. (Fri,) studied this question.