Machine learning-assisted enzyme engineering through ultra-high throughput sorting and large-scale sequence-function data generation

Avatar
Poster
Voice is AI-generated
Connected to paperThis paper is a preprint and has not been certified by peer review

Machine learning-assisted enzyme engineering through ultra-high throughput sorting and large-scale sequence-function data generation

Authors

Zhang, J.; Shanmugam, S.; Yeoh, J. W.; Zheng, D.; Goh, J. R.; Lin, Z.; Poh, C. L.

Abstract

Machine learning (ML) shows great promise in protein engineering but has yet to be integrated with ultra-high throughput sorting (ultra-HTS) and NGS for large-scale sequence-function data generation to harness its capability to explore wider search space and more complex mutation events. Here, we introduce PUSDA, a framework that rapidly sorts mutant libraries into multiple performance groups with good accuracy and generates large-scale sequence-function data to power ML-driven protein design. As a demonstration, PUSDA generated over five million sequence-function data of an enzyme, with data processing revealing over 1.3 million unique enzyme mutants being sorted in a single day. With a trained ML model that achieved 93.52% accuracy, we further analysed combinatorial mutation events and applied a ratio-based selection approach to design novel enzyme sequences. Validation experiment demonstrated a 16.67-fold improvement in efficiency of identifying high-performance enzymes using PUSDA compared to using ultra-HTS alone. The designed novel enzyme achieved 8.23-fold increase in productivity compared to wild type. PUSDA lays a foundation to integrate ultra-HTS, NGS, and ML for future predictive enzyme engineering, offering a data-driven tool for accelerating breakthroughs in biotechnology.

Follow Us on

0 comments

Add comment