Abstract Short-sized copy number variants (CNVs) account for a large proportion of the somatic cancer genome landscape. Analysis of CNVs from Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium showed that the sizes of copy number deletions and duplications have multimodal distributions with one of the major modes centering around 1kb 1. Sensitive profiling of structural variants (SVs), which include CNVs, with long-read sequencing (LRS), showed that the median SV lengths range from 133bp to 1kbp by multiple SV callers 2. However, existing CNV analysis tools for short-read sequencing data (SRS) are based on a binning approach, which limits the reliable detection of these short-sized CNVs. Exome sequencing data poses additional challenges of uneven read depth due to capturing efficiency differing between genomic positions. Therefore, we developed a methodology that uses read depth data available at every genomic position to discover short-sized CNVs, or Micro CNVs, from exome sequencing data. We employed an approach we named adaptive window extension, in which we extended the windows to identify genomic segments with read depths significantly deviating from control samples. Then, we fine-tuned the variant boundaries by searching for an optimal score in the base-level space. We evaluated our approach in our simulated data where we locally simulated 12 genomic loci containing well-known cancer genes, both tumor suppressors and oncogenes. We were able to obtain median sensitivity over 0.9 for 2-fold copy number deletions and duplications in 90% purity tumors as short as 300bp over the captured region, which corresponds to 1-2 exons involved in the variation. At 50% purity, similar performance was observed for simulated variants as short as 1kbp over the captured region, retaining high specificity. Given the current lower limit of sensitive detection of germ-line CNV (of 100% purity) is 3 exons (∼750bp in the captured region) after comprising specificity 3, we believe our tool presents competitive results even in the lower purity settings and, when applied to a larger cancer exome cohort, will discover additional cancer driver genes and actionable genes. 1. Li, Y., et al., Patterns of somatic structural variation in human cancer genomes. Nature, 2020. 578(7793): p. 112-121. 2. Liu, L., et al., Performance of somatic structural variant calling in lung cancer using Oxford Nanopore sequencing technology. BMC Genomics, 2024. 25(1): p. 898. 3. Babadi, M., et al., GATK-gCNV enables the discovery of rare copy number variants from exome sequencing data. Nat Genet, 2023. 55(9): p. 1589-1597. Citation Format: Jin Young Lee, Hyo Young Choi, D. Neil Hayes. Bioinformatic method for the detection of micro copy number variations with exome sequencing data abstract. In: Proceedings of the American Association for Cancer Research Annual Meeting 2026; Part 1 (Regular Abstracts); 2026 Apr 17-22; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2026;86(7 Suppl):Abstract nr 5508.
Lee et al. (Fri,) studied this question.