Uncovering differential tolerance to deletions versus substitutions with a protein language model

Avatar
Poster
Voices Powered byElevenlabs logo
Connected to paperThis paper is a preprint and has not been certified by peer review

Uncovering differential tolerance to deletions versus substitutions with a protein language model

Authors

Goldman, G.; Chati, P.; Ntranos, V.

Abstract

Deep mutational scanning (DMS) experiments have been successfully leveraged to understand genotype to phenotype mapping, with broad implications for protein engineering, human genetics, drug development, and beyond. To date, however, the overwhelming majority of DMS have focused on amino acid substitutions, excluding other classes of variation such as deletions or insertions. As a consequence, it remains unclear how indels differentially shape the fitness landscape relative to substitutions. In order to further our understanding of the relationship between substitutions and deletions, we leveraged a protein language model to analyze every single amino acid deletion in the human proteome. We discovered hundreds of thousands of sites that display opposing behavior for deletions versus substitutions, i.e. sites that can tolerate being substituted but not deleted, and vice versa. We identified secondary structural elements and sequence context to be important mediators of differential tolerability at these sites. Our results underscore the value of deletion-substitution comparisons at the genome-wide scale, provide novel insights into how substitutions could systematically differ from deletions, and showcase the power of protein language models to generate biological hypotheses in-silico. All deletion-substitution comparisons can be explored and downloaded at https://huggingface.co/spaces/ntranoslab/diff-tol

Follow Us on

0 comments

Add comment