ProteinGym: Large-Scale Benchmarks for Protein Design and Fitness Prediction

By: Notin, P.; Kollasch, A. W.; Ritter, D.; van Niekerk, L.; Paul, S.; Spinner, H.; Rollins, N.; Shaw, A.; Weitzman, R.; Frazer, J.; Dias, M.; Franceschi, D.; Orenbuch, R.; Gal, Y.; Marks, D. S.

Predicting the effects of mutations in proteins is critical to many applications, from understanding genetic disease to designing novel proteins that can address our most pressing challenges in climate, agriculture and healthcare. Despite the surge in machine learning-based protein models to tackle these questions, an assessment of their respective benefits is challenging due to the use of distinct, often contrived, experimental datasets, and... more
Predicting the effects of mutations in proteins is critical to many applications, from understanding genetic disease to designing novel proteins that can address our most pressing challenges in climate, agriculture and healthcare. Despite the surge in machine learning-based protein models to tackle these questions, an assessment of their respective benefits is challenging due to the use of distinct, often contrived, experimental datasets, and the variable performance of models across different protein families. Addressing these challenges requires scale. To that end we introduce ProteinGym, a large-scale and holistic set of benchmarks specifically designed for protein fitness prediction and design. It encompasses both a broad collection of over 250 standardized deep mutational scanning assays, spanning millions of mutated sequences, as well as curated clinical datasets providing high-quality expert annotations about mutation effects. We devise a robust evaluation framework that combines metrics for both fitness prediction and design, factors in known limitations of the underlying experimental methods, and covers both zero-shot and supervised settings. We report the performance of a diverse set of over 70 high-performing models from various subfields (eg., alignment-based, inverse folding) into a unified benchmark suite. We open source the corresponding codebase, datasets, MSAs, structures, model predictions and develop a user-friendly website that facilitates data access and analysis. less
Targeted protein degradation systems to enhance Wnt signaling

By: Sampathkumar, P.; Jung, H.; Chen, H.; Zhang, Z.; Suen, N.; Yang, Y.; Huang, Z.; Lopez, T.; Benisch, R.; Lee, S.-J.; Ye, J.; Yeh, W.-C.; Li, Y.

Molecules that facilitate targeted protein degradation (TPD) offer great promise as novel therapeutics. Human hepatic lectin, asialoglycoprotein receptor (ASGR) is selectively expressed on hepatocytes. We have previously engineered an anti-ASGR1 antibody-mutant RSPO2 (RSPO2RA) fusion protein (called SWEETS) to drive tissue-specific degradation of ZNRF3/RNF43 E3-ubiquitin ligases, leading to hepatocyte specific enhanced Wnt signaling, prolifer... more
Molecules that facilitate targeted protein degradation (TPD) offer great promise as novel therapeutics. Human hepatic lectin, asialoglycoprotein receptor (ASGR) is selectively expressed on hepatocytes. We have previously engineered an anti-ASGR1 antibody-mutant RSPO2 (RSPO2RA) fusion protein (called SWEETS) to drive tissue-specific degradation of ZNRF3/RNF43 E3-ubiquitin ligases, leading to hepatocyte specific enhanced Wnt signaling, proliferation, and restored liver function in mouse models. Such an antibody-RSPO2RA fusion molecule is currently in human clinical trials. In the current study, we identified two new ASGR1 and ASGR1/2 specific antibodies, 8M24 and 8G8. High-resolution crystal structures of ASGR1:8M24 and ASGR2:8G8 complexes revealed that these antibodies bind to distinct epitopes on the opposite sides of ASGR, away from the substrate binding site. Both antibodies enhanced Wnt-activity when assembled as SWEETS molecules with RSPO2RA through specific effects sequestering E3 ligases. In addition, 8M24-RSPO2RA and 8G8-RSPO2RA efficiently downregulated ASGR1 through TPD mechanisms. These results demonstrate the possibility of combining different therapeutic effects and different degradation mechanisms in a single molecule. less
PHEIGES, all-cell-free phage synthesis and selection from engineered genomes

By: Levrier, A.; Karpathakis, I.; Nash, B.; Bowden, S. D.; Lindner, A. B.; Noireaux, V.

Bacteriophages constitute an invaluable biological reservoir for biotechnology and medicine. The ability to exploit such vast resources is hampered by the lack of methods to rapidly engineer, assemble, package genomes, and select phages. Cell-free transcription-translation(TXTL) offers experimental settings to address such a limitation. Here, we describe PHage Engineering by In vitro Gene Expression and Selection (PHEIGES) using T7 phage geno... more
Bacteriophages constitute an invaluable biological reservoir for biotechnology and medicine. The ability to exploit such vast resources is hampered by the lack of methods to rapidly engineer, assemble, package genomes, and select phages. Cell-free transcription-translation(TXTL) offers experimental settings to address such a limitation. Here, we describe PHage Engineering by In vitro Gene Expression and Selection (PHEIGES) using T7 phage genome and Escherichia coli TXTL. Phage genomes are assembled in vitro from PCR-amplified fragments and directly expressed in batch TXTL reactions to produce up to 1011 PFU/ml engineered phages within one day. We further demonstrate an important genotype-phenotype linkage of phage assembly in bulk TXTL. This enables rapid selection of phages with altered rough lipopolysaccharides specificity from phage genomes incorporating tail fiber mutant libraries. We establish the scalability of PHEIGES by one-pot assembly of such mutants with fluorescent gene integration and 10% length-reduced genome. less
ProteinNPT: Improving Protein Property Predictionand Design with Non-Parametric Transformers

By: Notin, P.; Weitzman, R.; Marks, D. S.; Gal, Y.

Protein design holds immense potential for optimizing naturally occurring proteins, with broad applications in drug discovery, material design, and sustainability. However, computational methods for protein engineering are confronted with significant challenges, such as an expansive design space, sparse functional regions, and a scarcity of available labels. These issues are further exacerbated in practice by the fact most real-life design sc... more
Protein design holds immense potential for optimizing naturally occurring proteins, with broad applications in drug discovery, material design, and sustainability. However, computational methods for protein engineering are confronted with significant challenges, such as an expansive design space, sparse functional regions, and a scarcity of available labels. These issues are further exacerbated in practice by the fact most real-life design scenarios necessitate the simultaneous optimization of multiple properties. In this work, we introduce ProteinNPT, a non-parametric transformer variant tailored to protein sequences and particularly suited to label-scarce and multi-task learning settings. We first focus on the supervised fitness prediction setting and develop several cross-validation schemes which support robust performance assessment. We subsequently reimplement prior top-performing baselines, introduce several extensions of these baselines by integrating diverse branches of the protein engineering literature, and demonstrate that ProteinNPT consistently outperforms all of them across a diverse set of protein property prediction tasks. Finally, we demonstrate the value of our approach for iterative protein design across extensive in silico Bayesian optimization and conditional sampling experiments. less
Validation of cell-free protein synthesis aboard the International Space Station

By: Kocalar, S.; Miller, B. M.; Huang, A.; Gleason, E.; Martin, K.; Foley, K.; Copeland, D. S.; Jewett, M.; Saavedra, E. A.; Kraves, S.

Cell-free protein synthesis (CFPS) is a rapidly maturing in vitro gene expression platform that can be used to transcribe and translate nucleic acids at the point of need, enabling on-demand synthesis of peptide-based vaccines and biotherapeutics, as well as the development of diagnostic tests for environmental contaminants and infectious agents. Unlike traditional cell-based systems, CFPS platforms do not require the maintenance of living ce... more
Cell-free protein synthesis (CFPS) is a rapidly maturing in vitro gene expression platform that can be used to transcribe and translate nucleic acids at the point of need, enabling on-demand synthesis of peptide-based vaccines and biotherapeutics, as well as the development of diagnostic tests for environmental contaminants and infectious agents. Unlike traditional cell-based systems, CFPS platforms do not require the maintenance of living cells and can be deployed with minimal equipment; therefore, they hold promise for applications in low-resource contexts, including spaceflight. Here we evaluate the performance of cell-free BioBits (R) platform aboard the International Space Station by expressing RNA-based aptamers and fluorescent proteins that can serve as biological indicators. We validate two classes of biological sensors that detect either the small molecule DFHBI or a specific RNA sequence. Upon detection of their respective analytes, both biological sensors produce fluorescent readouts that are visually confirmed using a handheld fluorescence viewer and imaged for quantitative analysis. Our findings provide insight into the kinetics of cell-free transcription and translation in a microgravity environment and reveal that both biosensors perform robustly in space. Our findings lay the groundwork for portable, low-cost applications ranging from point-of-care health monitoring to on-demand detection of environmental hazards in low-resource communities both on Earth and beyond. less
Transcription attenuation in synthetic promoters in tandem formation

By: Chauhan, V.; Baptista, I. S. C.; Jagadeesan, R.; Dash, S.; Ribeiro, A. S.

Closely spaced promoters are ubiquitous in prokaryotic and eukaryotic genomes. How their structure and dynamics relate remains unclear, particularly for tandem formations. To study their transcriptional interference, we engineered two pairs and one trio of synthetic promoters in non-overlapping, tandem formation, in single-copy plasmids. From in vivo measurements in E. coli cells, we found that promoters in tandem formation have attenuated tr... more
Closely spaced promoters are ubiquitous in prokaryotic and eukaryotic genomes. How their structure and dynamics relate remains unclear, particularly for tandem formations. To study their transcriptional interference, we engineered two pairs and one trio of synthetic promoters in non-overlapping, tandem formation, in single-copy plasmids. From in vivo measurements in E. coli cells, we found that promoters in tandem formation have attenuated transcription rates. The attenuation strength can be widely fine-tuned by the promoters\' positioning, natural regulatory mechanisms, and other factors, including the antibiotic rifampicin, which hampers RNAP promoter escape. From this, and supported by in silico models, we concluded that the attenuation emerges from premature terminations generated by collisions between RNAPs elongating from upstream promoters and RNAPs occupying downstream promoters. Moreover, we found that these collisions can cause one or both RNAPs to fall-off. The broad spectrum of possible, externally regulated, attenuation strengths in synthetic tandem promoters should make these structures valuable internal regulators of future synthetic circuits. less
Population suppression with dominant female-lethal alleles is boosted by homing gene drive

By: Zhu, J.; Chen, J.; Liu, Y.; Xu, X.; Champer, J.

Methods to suppress pest insect populations using genetic constructs and repeated releases of male homozygotes have recently been shown to be an attractive alternative to older sterile insect technique based on radiation. Female-specific lethal alleles have substantially increased power, but still require large, sustained transgenic insect releases. Gene drive alleles bias their own inheritance to spread throughout populations, potentially al... more
Methods to suppress pest insect populations using genetic constructs and repeated releases of male homozygotes have recently been shown to be an attractive alternative to older sterile insect technique based on radiation. Female-specific lethal alleles have substantially increased power, but still require large, sustained transgenic insect releases. Gene drive alleles bias their own inheritance to spread throughout populations, potentially allowing population suppression with a single, small-size release. However, suppression drives often suffer from efficiency issues, and the most well-studied type, homing drives, tend to spread without limit. In this study, we show that coupling female-specific lethal alleles with homing gene drive allowed substantial improvement in efficiency while still retaining the self-limiting nature (and thus confinement) of a lethal allele strategy. Using a mosquito model, we show the required releases sizes for population elimination in a variety of scenarios, including different density growth curves, with comparisons to other systems. Resistance alleles reduced the power of this method, but these could be overcome by targeting an essential gene with the drive while also providing rescue. A proof-of-principle demonstration of this system in Drosophila melanogaster was effective in both basing its inheritance and achieving high lethality among females that inherit the construct in the absence of antibiotic. Overall, our study shows that substantial improvements can be achieved in female-specific lethal systems for population suppression by combining them with a gene drive. less
Discovery of high affinity and specificity stapled peptide Bcl-xL inhibitors using bacterial surface display

By: Case, M.; Vinh, J.; Kopp, A.; Smith, M. D.; Thurber, G.

Intracellular protein-protein interactions are involved in many different diseases, making them prime targets for therapeutic intervention. Several diseases are characterized by their overexpression of Bcl-xL, an anti-apoptotic B cell lymphoma 2 (Bcl-2) protein expressed on mitochondrial membranes. Bcl-xL overexpression inhibits apoptosis, and selective inhibition of Bcl-xL has the potential to increase cancer cell death while leaving healthy... more
Intracellular protein-protein interactions are involved in many different diseases, making them prime targets for therapeutic intervention. Several diseases are characterized by their overexpression of Bcl-xL, an anti-apoptotic B cell lymphoma 2 (Bcl-2) protein expressed on mitochondrial membranes. Bcl-xL overexpression inhibits apoptosis, and selective inhibition of Bcl-xL has the potential to increase cancer cell death while leaving healthy cells comparatively less affected. However, high homology between Bcl-xL and other Bcl-2 proteins has made it difficult to selectively inhibit this interaction by small molecule drugs. We engineered stapled peptides, a chemical modification that can improve cell penetration, protease stability, and conformational stability, towards the selective inhibition of Bcl-xL. To accomplish this task, we built a focused combinatorial mutagenesis library of peptide variants on the bacterial cell surface, used copper catalyzed click chemistry to form stapled peptides, and sorted the library for high binding to Bcl-xL and minimal binding towards other Bcl-2 proteins. We characterized the sequence and staple placement trends that governed specificity and identified molecules with ~10 nM affinity to Bcl-xL and greater than 100-fold selectivity versus other Bcl-2 family members on and off the cell surface. We confirmed the mechanism of action of these peptides is consistent with apoptosis biology through mitochondrial outer membrane depolarization assays (MOMP). Overall, high affinity (10 nM Kd) and high specificity (100-fold selectivity) peptides were developed to target the Bcl-xL protein. These results demonstrate that stapled alpha helical peptides are promising candidates for the specific treatment of cancers driven by Bcl-2 dysregulation. less
Discovery of a high-performance phage-derived promoter/repressor system for probiotic lactobacillus engineering

By: Blanch-Asensio, M.; Tadimarri, V. S.; Wilk, A.; Sankaran, S.

Background: The Lactobacillus family comprises many species of great importance for the food and healthcare industries, with numerous strains identified as beneficial for humans and used as probiotics. Hence, there is a growing interest in engineering these probiotic bacteria as live biotherapeutics for animals and humans. However, the genetic parts needed to regulate gene expression in these bacteria remain limited compared to model bacteria... more
Background: The Lactobacillus family comprises many species of great importance for the food and healthcare industries, with numerous strains identified as beneficial for humans and used as probiotics. Hence, there is a growing interest in engineering these probiotic bacteria as live biotherapeutics for animals and humans. However, the genetic parts needed to regulate gene expression in these bacteria remain limited compared to model bacteria like E. coli or B. subtilis. To address this deficit, in this study, we selected and tested several bacteriophage-derived genetic parts with the potential to regulate transcription in lactobacilli. Results: We screened genetic parts from 6 different lactobacilli-infecting phages and identified one promoter/repressor system with unprecedented functionality in L. plantarum WCFS1. The phage-derived promoter was found to achieve expression levels nearly 9-fold higher than the previously reported strongest promoter in this strain and the repressor was able to almost completely repress this expression by reducing it nearly 500-fold. Conclusions: The new parts and insights gained from their engineering will enhance the genetic programmability of lactobacilli for healthcare and industrial applications. less
High yield, low magnesium flexizyme reactions in a water-ice eutectic phase

By: Davisson, J.; Alejo, J.; Blank, M.; Kalb, E.; Prasad, A.; Knudson, I.; Schepartz, A.; Engelhart, A. E.; Adamala, K. P.

Flexizymes enable the stoichiometric acylation of tRNAs with a variety of compounds, enabling the in vitro translation of peptides with both non-natural backbones and side chains. However, flexizyme reactions have several drawbacks, including single-turnover kinetics, high Mg(II) carryover inhibiting in vitro translation, and rapid product hydrolysis. Here we present flexizyme reactions utilizing an ice-eutectic phase, with high yields, 30X l... more
Flexizymes enable the stoichiometric acylation of tRNAs with a variety of compounds, enabling the in vitro translation of peptides with both non-natural backbones and side chains. However, flexizyme reactions have several drawbacks, including single-turnover kinetics, high Mg(II) carryover inhibiting in vitro translation, and rapid product hydrolysis. Here we present flexizyme reactions utilizing an ice-eutectic phase, with high yields, 30X lower Mg(II), and long-term product stability. The eutectic flexizyme reactions increase the ease of use and flexibility of flexizyme aminoacylation, and increase the in vitro protein production. less