ESMStabP: A Regression Model for Protein Thermostability Prediction
ESMStabP: A Regression Model for Protein Thermostability Prediction
Ramos, M. A.; Jernigan, R. L.; Kilinc, M.
AbstractAccurately predicting protein thermostability is crucial for numerous applications in biotechnology, pharmaceuticals, and food science. Experimental methods for determining protein melting temperatures are often time-consuming and costly, driving the need for efficient computational alternatives. In this paper, we introduce ESMStabP, an enhanced regression model for predicting protein thermostability. To improve model performance and generalizability, we assembled a significantly larger dataset by combining and cleaning datasets previously utilized in other thermostability models. Building on DeepStabP, ESMStabP incorporates significant improvements, using embeddings from the ESM2 protein language model and thermophilic classifications. The predictions from ESMStabP outperform DeepStabP and other existing predictors, achieving a coefficient of determination of 0.95 and a Pearson correlation coefficient (PCC) of 0.97. Despite these improvements, challenges such as dataset availability remain. This work underscores the critical role of specific layer identification for model development and outlines potential directions for future advancements in protein stability predictions.