Linking Codon- and Protein-Level Mutation Scores to Population Genetics Reveals Heterogeneous Selection Efficiency Across Escherichia coli Lineages
Linking Codon- and Protein-Level Mutation Scores to Population Genetics Reveals Heterogeneous Selection Efficiency Across Escherichia coli Lineages
Mischler, M.; Vigue, L.; Croce, G.; Weigt, M.; Tenaillon, O.
AbstractQuantifying the selective effects of individual mutations is essential to understand how their population-wise frequencies evolve under natural selection and genetic drift. Large genomic datasets provide a real-life experiment that we exploit to characterize the efficiency of selection across different mutations types and populations. Using Direct Coupling Analysis, a model from statistical physics, we derive protein-informed scores for individual non-synonymous mutations identified in 81,440 Escherichia coli genomes. We show that these scores act as a latent variable capturing the probability that a mutation is beneficial, neutral, or mildly to highly deleterious. We contribute to the debate on the importance of synonymous mutations by demonstrating that their selection intensities span a single order of magnitude in the E. coli species, whereas non-synonymous mutations span six orders of magnitude. We further relate selection efficiency to genetic drift, defined as the inverse of population size, and to ecological lifestyle, and we identify a 10,000-fold reduction in selection efficiency between the entire E. coli species and its most pathogenic populations. Together, these results highlight how population genetics and protein variant fitness predictors inform one another: variation in selection efficiency is associated with shifts in the distribution of mutation scores, and population genetics data provide a benchmark to assess the accuracy of these scores.