Biobank-Scale Polygenic Prediction in Admixed Populations Using Local Ancestry via the Group Lasso

Avatar
Poster
Voice is AI-generated
Connected to paperThis paper is a preprint and has not been certified by peer review

Biobank-Scale Polygenic Prediction in Admixed Populations Using Local Ancestry via the Group Lasso

Authors

Bonet, D.; Yang, J.; Hastie, T.; Ioannidis, A. G.

Abstract

Polygenic risk models trained in one ancestry often fail to perform well in others, in part due to linkage disequilibrium and allele frequency differences across ancestries. In response, separate models trained for specific ancestries have been introduced. However, this single ancestry approach is untenable in admixed groups, which have ancestry from multiple sources varying across the genome and between individuals. Here we present Combine, a biobank-scale sparse regression framework that augments per-variant genotype features with local ancestry dosages and fits all effects jointly using a variant-level group lasso penalty. In 99{,}298 admixed participants from the All of Us Research Program, Combine substantially outperforms state-of-the-art multi-ancestry summary-statistic approaches (e.g., 144\% relative improvement over PRS-CSx for white blood cell count). Furthermore, it matches or improves upon the predictive performance of highly optimized individual-level models (iPGS/snpnet) across seven of nine evaluated phenotypes, while uniquely providing locus-level interpretability to disentangle shared allelic effects from ancestry-linked tagging. An ancestry-specific extension, Combine-S, estimates haplotypic ancestry-associated SNP effects together with local-ancestry terms, enabling systematic identification of ancestry-dependent effect magnitudes and sign differences at established and plausible loci. Finally, we show that incorporating external GWAS evidence through group-specific penalty weights improves LDL cholesterol prediction without pre-filtering variants. Together, Combine provides a scalable framework for polygenic modeling that prioritizes efficient local ancestry-aware modeling and interpretation in admixed biobanks.

Follow Us on

0 comments

Add comment