To fly, or not to fly, that is the question: A deep learning model for peptide detectability prediction in mass spectrometry
To fly, or not to fly, that is the question: A deep learning model for peptide detectability prediction in mass spectrometry
Abdul-Khalek, N.; Picciani, M.; Wimmer, R.; Overgaard, M. T.; Wilhelm, M.; Gregersen Echers, S.
AbstractIdentifying detectable peptides, known as flyers, is key in mass spectrometry-based proteomics. Peptide detectability is strongly related with the peptide sequence and its resulting physicochemical properties. Moreover, the high variability in MS data, particularly in peptide detectability and intensity across multiple analyses and samples, makes the development of a generic model for detectability prediction unfeasible. This underlines the need for tools that can be refined for specific experimental conditions. To address this need, we present Pfly, a deep learning model developed to predicts peptide detectability based solely on peptide sequence. Pfly distinguishes itself as a versatile and reliable state-of-the-art tool, offering high performance, accessibility, and easy customizability for end-users. This adaptability allows researchers to tailor the model to their specific experimental conditions, facilitating the creation of lab-specific models. This, in turn, can lead to more accurate results and expand the model\'s applicability across various research fields. The model\'s architecture is an encoder-decoder with an attention mechanism. This tool classifies peptides as either flyers or non-flyers, providing both binary probabilities and detailed categorical probabilities for four distinct classes defined in this study: non-flyer, weak flyer, intermediate flyer, and strong flyer. The model was initially trained on a synthetic peptide library and subsequently fine-tuned with a biological dataset to mitigate bias towards synthesizability, improving the predictive capacity and outperforming state-of-the-art predictors in a benchmark comparison. The study further investigates the influence of protein abundance and the search engine, illustrating the negative impact on peptide identification due to misclassification. Pfly has been integrated in the DLOmix framework and it is accessible on GitHub at https://github.com/wilhelm-lab/dlomix.