PreDSLpmo V2.0: A deep learning-based prediction tool for functional annotation of lytic polysaccharide monooxygenases
PreDSLpmo V2.0: A deep learning-based prediction tool for functional annotation of lytic polysaccharide monooxygenases
Arulselvan, K. S.; Pulavendran, P.; Saravanan, V.; Yennamalli, R. M.
AbstractLytic polysaccharide monooxygenases (LPMOs) are redox enzymes in the oxidative cleavage of recalcitrant polysaccharides, facilitating efficient biomass deconstruction for biofuel production. While computational annotation tools have accelerated LPMO discovery, the existing model is limited to the AA9 and AA10 families, leaving the broader diversity of LPMOs underexplored. In this study, we present an improved machine learning framework designed for classifying and annotating LPMOs across the eight families (AA9, AA10, AA11, AA13, AA14, AA15, AA16, and AA17). We curated a high-quality dataset by integrating sequences from GenBank, Joint Genome Institute, and the CAZy database, followed by data cleaning and redundancy reduction using CD HIT, consisting of 33691 LPMO sequences. Feature extraction was performed using both Python (iFeature) and R-based pipelines, generating over 13526 descriptors per sequence to capture compositional, physicochemical, and structural properties. Ensemble feature selection was used to identify the significant features for both binary and multiclass classification. An independent dataset was used to estimate the trained models performance. To address data imbalance, we maintained a 1:1 ratio for all positive and negative sets during both training and validation. A range of machine learning models were systematically trained and evaluated. The multiclass Bi-LSTM model demonstrated higher accuracy, robustness, and generalizability, outperforming traditional approaches. We compared the performance with dbCAN3, a CAZyme predictor. The final model was deployed as a user-friendly web server (https://predlpmo.in), enabling high-throughput, sequence-based functional annotation of LPMOs for the research community. This work bridges a critical gap in LPMO annotation, providing a scalable and reliable solution for enzyme discovery in bioenergy and industrial biotechnology.