The abundant discordance between evolutionary relationships across the genome has rekindled interest in methods for comparing and averaging trees on a shared leaf set. However, compared to tree topology, where much progress has been made, handling branch lengths has been more challenging. Species tree branch lengths can be measured in various units, often different from gene trees. Moreover, rates of evolution change across the genome, the species tree, and specific branches of gene trees. These factors compound the stochasticity of coalescence times and estimation noise, making branch lengths highly heterogeneous across the genome. For many downstream applications in phylogenomic analyses, branch lengths are as important as the topology, and yet, existing tools to compare and combine weighted trees are limited. In this paper, we address the question of matching one tree to another, accounting for their branch lengths. We define a series of computational problems called Topology-Constrained Metric Matching (TCMM) that seek to transform the branch lengths of a query tree based on a reference tree. We show that TCMM problems can be solved efficiently using a linear algebraic formulation coupled with dynamic programming preprocessing. While many applications can be imagined for this framework, we explore two applications in this paper: embedding leaves of gene trees in Euclidean space to find outliers potentially indicative of estimation errors, and summarizing gene tree branch lengths onto the species tree. In these applications, our method, when paired with existing methods, increases their accuracy at limited computational expense.
Arasti et al. (Mon,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: