Inference of fitness landscapes with heterogeneous patterns of epistasis across sites

Avatar
Poster
Voice is AI-generated
Connected to paperThis paper is a preprint and has not been certified by peer review

Inference of fitness landscapes with heterogeneous patterns of epistasis across sites

Authors

Marti-Gomez, C.; McCandlish, D. M.

Abstract

Fitness landscapes provide a framework for understanding how genetic variation shapes evolutionary outcomes. Although these landscapes were long treated as abstract conceptual objects, recent advances in genetic engineering and high-throughput phenotyping have enabled the empirical measurement of phenotypic values across large combinatorial sequence spaces. These developments create a need for statistical frameworks that can summarize, infer, and interpret fitness landscapes in the presence of complex genetic interactions. Here, we introduce a framework for summarizing the structure of genetic interactions across sites based on the average squared local k-way epistatic coefficients between mutations at different subsets of sites, and derive the precise manner in which the variance in these local k-way epistatic coefficients across backgrounds relates to epistasis of orders higher than k. These statistics can be computed exactly for complete combinatorial landscapes and are related to classical statistics in the fitness landscape literature. Moreover, they can be estimated from empirical correlations when data are incomplete or noisy, and used to define an empirical Bayes prior for fitness landscape inference that differentially penalizes interactions involving different subsets of sites. We apply this inference method to diverse high-throughput protein and RNA combinatorial mutagenesis datasets and find that fitness landscapes often show highly structured patterns of genetic interactions across positions. Finally, we use this model to infer a fitness landscape for a dynamic self-splicing intron comprising 65,536 genotypes, and describe in detail the main genetic interactions that shape the structure of this landscape and how they relate to the underlying molecular mechanism. Together, these results provide new tools for summarizing and modeling complex fitness landscapes, and for linking large-scale empirical data to the mathematical theory of fitness landscapes.

Follow Us on

0 comments

Add comment