Information Leakage in Enzyme Substrate Prediction
Information Leakage in Enzyme Substrate Prediction
Atabaigi Elmi, V.; Joeres, R.; Kalinina, O. V.
AbstractEnzymes are essential catalysts in many cellular processes. Understanding their interactions with small molecules, such as regulators, cofactors, and most importantly, substrates, is crucial for understanding the biochemical processes that occur in cells. Correctly interpreting the roles of small molecules that interact with enzymes is key to elucidating enzyme function. Recently, the field of enzyme-small molecule interaction prediction has gained more interest from computational and, especially, deep-learning methods, and numerous datasets and models with remarkable performances have been published. In this work, we critically examine one of the most popular datasets and three models trained on it, identifying leaked information that may overinflate reported model performance. We show that the inspected models are susceptible to information leakage, and their performance drops to near-random when the leakage is removed.