SplitAligner: A Gene-Species Tree Reconciliation Framework Using Split-Based Branch Mapping
SplitAligner: A Gene-Species Tree Reconciliation Framework Using Split-Based Branch Mapping
Wu, J.
AbstractPhylogenomic analyses are increasingly focused on branch-specific questions within a fixed species tree. However, two pervasive challenges in real datasets missing taxa and gene-tree/species-tree discordance complicate the comparability of branches across loci. Here, we introduce SplitAligner, a split-based framework that defines branch identity on a fixed species-tree backbone and evaluates it gene by gene under varying taxon coverage. For each species-tree branch, SplitAligner projects its split onto the gene-specific taxon set to determine whether the branch is evaluable or structurally missing due to a degenerate projected split. Under fixed-topology gene trees, this projection reveals branch fusion, where multiple species-tree branches collapse into an indistinguishable fusion group on the observed taxa. SplitAligner reports such cases as composite fused-branch identities and aggregates branch lengths accordingly. Under free-topology gene trees, SplitAligner further identifies topology-induced missingness (NA_topo), where a branch is decisive under the projected-split criterion but its projected split is absent from the gene tree, separating discordance-driven absence from coverage-driven missingness. These operations produce standardized branch-by-gene tables and a branch-wise concordance score (Support), defined as the fraction of decisive genes whose free-topology trees recover each projected split. Applying SplitAligner to 2,275 single-copy genes from a dataset of 302 mammals reveals heterogeneous concordance across the mammalian phylogeny and highlights internodes with elevated discordance-associated missingness. The resulting branch coordinate system provides a general framework for branch-based estimates of evolutionary rates, selective constraints, and other branch-wise summaries across thousands of loci/genes.