Machine-learning-driven prediction and design of intrinsic transcription terminators

Avatar
Poster
Voice is AI-generated
Connected to paperThis paper is a preprint and has not been certified by peer review

Machine-learning-driven prediction and design of intrinsic transcription terminators

Authors

Kundlatsch, G. E.; Neto, A. P. d. S.; de Paiva, G. B.; Rech, E.; Duarte, L. T.; Pedrolli, D. B.

Abstract

Intrinsic transcription terminators are biological parts critical for controlling gene expression in natural genomes and are fundamental to the modularity and predictability of synthetic gene circuits. Despite their simplicity of structure and function, we have not yet been able to rationally engineer synthetic terminators with a pre-defined strength, nor to accurately predict their strength from sequence. Here, we leveraged a curated library of bacterial terminators to train a data-driven predictive model, and, building on this surrogate, developed open-source software tools for predicting terminator performance and designing new intrinsic terminator sequences. Model interpretability analysis indicates that U-tract features emphasize a distal region longer than previously anticipated and that the initial hairpin GC content influence extends beyond the reported range. Using the final trained model, we implemented two software tools. The Terminator Strength Predictor (TerSP) computes the full feature representation directly from an input sequence and outputs a quantitative strength prediction together with a binary strong/weak classification. We validated TerSP using experimentally characterized terminators from bacteria other than E. coli. The Terminator Factory (TerFac) implements a surrogate-based optimization framework for target-driven terminator design under user-defined strength and length constraints. Using TerFac, we enumerated length-specific sets of maximally strong terminators, designed optimized synthetic terminators, and optimized a wild-type terminator. The designed terminators were validated in vivo in E. coli and in vitro, using a newly developed assay based on fluorescent RNA aptamers. The TerFac-designed terminators showed the expected strength, and the strongest one outperformed the best reference terminator in the training dataset, both in vivo and in vitro. These results indicate that the model captured sequence-to-function rules that are informative both for forward prediction (TerSP) and for the design of terminators with defined strength (TerFac).

Follow Us on

0 comments

Add comment