Biomolecules often exhibit significant structural disorder, obscuring the emergence of function. For single-stranded nucleic acids, disorder is especially pronounced due to flexibility of unpaired regions, as well as competing hybridization patterns. To address this complexity, we develop and apply clustering methods based on secondary structure. The core concept uses the number of base pairs that have to be reorganized as the distance rather than atomistic metrics such as RMSD. This captures the vast atomistic landscape with a compressed structural space where similarities and transitions are physically meaningful. This approach accommodates a variety of algorithms, from k-means to hierarchical and density-based clustering, and identifies order even when conventional structural metrics fail. We demonstrate the method on replica-exchange molecular dynamics simulations of pseudo-random, single-stranded DNA. Despite the absence of a unique fold, secondary-structure clustering identifies robust hybridization motifs that persist across temperature. In particular, it reveals the presence of residual hybridization at high temperature with probabilities that follow a Boltzmann distribution, and delineates the evolution of structural subensembles during melting. The resulting clusters clarify how DNA progressively loses order, providing a natural coarse-graining of the structural space. Moreover, examining the evolution of clusters versus control parameters, like temperature, highlights how clustering can capture the flow of information during physical processes. We further investigate hierarchical strategies and compare between secondary-structure clustering and atomistic-based clustering. Overall, we establish a general framework for analyzing biomolecular ensembles through secondary structure. This approach clarifies behavior during disorder-driven transitions, advances the conceptual foundation of clustering as a tool for biophysics, and suggests broad utility beyond nucleic acids, including for intrinsically disordered proteins.
Baral et al. (Sun,) studied this question.