Abstract Cancer cells differ from healthy cells due to somatic aberrations (e.g., somatic mutations or structural variations). Studying these mutations helps us better understand tumor evolution and opens the way to more effective treatment strategies. Cancer cells exhibit heterogeneity across different populations, even within a single tumor, and these populations are defined by their unique somatic mutations. Given a set of somatic mutations -some shared across different clones and some unique- determining which mutations co-occur enables phasing of tumor clones. Ultimately, phasing these mutations from bulk sequencing data is crucial for studying tumor evolution. SNVs and short indels are the most common somatic mutations in tumor cells and even though these mutations can be effectively detected with short reads, linking nearby mutations remains challenging due to the nature of these reads. On the other hand, long reads have been successfully used in direct phasing of germline variants into megabase-scale phase blocks by enabling linkage of distant SNVs and thus shows promise for phasing somatic variants.Reconstruction of tumor clones is a multi-allelic phasing problem. Obtaining a phylogeny of detected somatic point mutations (PM) is a first step in phasing these mutations into different clones where nodes in the constructed phylogeny correspond to possible clonal haplotypes. For this purpose, we propose timing pairs of PMs against each other based on their co-occurrences in the reads that cover both locations and whether these reads support the reference or alternate allele. Given two point mutations and a set of reads that cover these two positions; it is possible to deduce whether these two mutations occurred one after the other in the same branch, co-occurred together, or occurred in different branches (i.e., divergent). Since timing of PM pairs are transitive, it is possible to obtain longer chains by timing PM pairs that are not connected by reads. We represent these relationships as a graph where PMs are nodes and each type of relationship is represented by a different edge. We applied our approach to H2009 and H1437 cell lines from the CASTLE collection (https://github.com/CASTLE-Panel/castle) with regular and ultra-long (100kb+) Oxford Nanopore reads. Although these cell lines are less heterogeneous compared to typical real tumor samples, phasing somatic mutations can also distinguish duplicated chromosome copies. For H1437 and H2009 cell lines, we detected a total of 87,999 and 162,334 somatic SNPs. The graphs constructed with these SNPs contained connected components with median spans of 102Kb and 55Kb (maximum: 9.5Mb and 6.4Mb). For both cell lines, connected components had a median of 2 (maximum: 56 and 80) haplotypes. Of these detected haplotypes, median of 1 for both cell lines (maximum: 23 and 46) consists of multiple SNPs. We plan to extend our approach to include timing of structural variations. Citation Format: Ataberk Donmez, Mikhail Kolmogorov, . Phasing tumor clones by timing point mutations using bulk long-read sequencing abstract. In: Proceedings of the American Association for Cancer Research Annual Meeting 2026; Part 1 (Regular Abstracts); 2026 Apr 17-22; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2026;86(7 Suppl):Abstract nr 6912.
Donmez et al. (Fri,) studied this question.