Abstract Microsatellites (mSats), or short tandem repeats (STRs), are repeated 1-6 bp DNA motifs that are abundantly distributed across the human genome. Variation in STR length contributes to genetic diversity and structural variation, and expansions beyond a pathogenic threshold underlie nearly 60 genetic disorders. mSat repeats can also serve as non-canonical enhancers for transcriptional regulators, including through binding the EWS::FLI1 fusion oncoprotein of Ewing sarcoma. Genome-wide analysis of mSats has been limited by short-read sequencing constraints, including read length and mapping ambiguity. Long-read sequencing has improved analyses of these regions but requires specialized algorithms. We developed a computational pipeline for genome-wide reference-based detection, length genotyping, sequence decomposition, and visualization of tetrameric mSats using long-read nanopore whole-genome sequencing. We applied this approach to GGAA mSats in five Ewing sarcoma cell lines and 100 diverse normal population genomes. We find both EWS::FLI1 binding to GGAA mSats and chromatin accessibility correlated with repeat length. Comparative analysis revealed a subset of mSats (2 - 3%) that were selectively expanded or contracted in Ewing sarcoma relative to normal genomes. Although we hypothesized that this variation in mSat length would converge towards a similar repeat length, we found that expanded loci tend to fall between 11 and 13 whereas contracted loci are commonly between 4 and 6. Further, expanded mSats demonstrated the highest proportion of mSats with EWS::FLI1 occupancy and accessible chromatin, compared to same and contracted. Finally, we show mSats demonstrating cell line-specific gained or lost chromatin accessibility was associated with expansion and contraction, respectively, in those cells. These results reveal a selective expansion of chromatin accessible mSats in Ewing sarcoma and provide a generalizable framework for resolving the genetic and structural complexity of mSats in human disease using long-read sequencing. Citation Format: Sara K. Peterson, A McCauley Massie, Alex Rubinsteyn, Jeremy R. Wang, Ian J. Davis, . Analysis of long-read sequencing data with vmwhere reveals variation in microsatellite length and chromatin state in Ewing sarcoma abstract. In: Proceedings of the American Association for Cancer Research Annual Meeting 2026; Part 1 (Regular Abstracts); 2026 Apr 17-22; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2026;86(7 Suppl):Abstract nr 2685.
Peterson et al. (Fri,) studied this question.