GhostHunter: A Multi-Test Framework for Detecting Ghost Introgression
GhostHunter: A Multi-Test Framework for Detecting Ghost Introgression
Wanjiku, M.; Sethuraman, A.
AbstractGene flow from extinct or unsampled ghost populations is increasingly recognized but remains difficult to detect without donor genomes. Signals of ghost introgression can mimic other demographic processes such as bottlenecks, structure, or migration among sampled populations. We introduce GhostHunter, a multi-step framework that combines (1) coalescent time distributions across loci, (2) likelihood-based tests under the isolation-with-migration (IM) model, and (3) population structure inference under an admixture model to detect ghost introgression from genomic data. By integrating independent signals, GhostHunter captures hidden ancestry, heterogeneity in genealogies, and improved fit of models including unsampled sources. Simulations under the IM model show that ghost introgression produces clear genome-wide signatures in TMRCA distributions, including multimodality and step-like ECDF patterns, even when median coalescent times are similar. Likelihood comparisons consistently support unsampled lineages, though estimating migration depends on admixture strength and divergence time. Clustering analyses also suggest increased support for population structure, typically at low (K). Applying GhostHunter to 1000 Genomes CEU and CHS data, we find strong heterogeneity in TMRCA estimates (62,748 windows; median 42,471 generations; KS (D=0.271); dip test rejects unimodality), consistent with mixed genealogies. Structure analyses favor (K=2), reflecting CEU-CHS divergence. However, IMa3 does not support non-zero ghost gene flow, suggesting genealogical signals are clearer than migration estimates. Overall, GhostHunter provides a practical screening framework for detecting hidden ancestry and reducing inference errors, especially in non-model systems.