Transparent exploration of machine learning for biomarker discovery from proteomics and omics data
Transparent exploration of machine learning for biomarker discovery from proteomics and omics data
Strauss, M. T.; Torun, F. M.; Virreira Winter, S.; Doll, S.; Riese, F. M.; Vorobyev, A.; Müller-Reif, J. B.; Geyer, P. E.
AbstractBiomarkers are of central importance for assessing the health state and to guide medical interventions and their efficacy, but they are lacking for most diseases. Mass spectrometry (MS)-based proteomics is a powerful technology for biomarker discovery, but requires sophisticated bioinformatics to identify robust patterns. Machine learning (ML) has become indispensable for this purpose, however, it is sometimes applied in an opaque manner, generally requires expert knowledge and complex and expensive software. To enable easy access to ML for biomarker discovery without any programming or bioinformatic skills, we developed OmicLearn (https://OmicLearn.com), an open-source web-based ML tool using the latest advances in the Python ML ecosystem. We host a web server for the exploration of the researchers results that can readily be cloned for internal use. Output tables from proteomics experiments are easily uploaded to the central or a local webserver. OmicLearn enables rapid exploration of the suitability of various ML algorithms for the experimental datasets. It fosters open science via transparent assessment of state-of-the-art algorithms in a standardized format for proteomics and other omics sciences. Graphical Abstract O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=200 SRC="FIGDIR/small/434053v1_ufig1.gif" ALT="Figure 1"> View larger version (32K): [email protected]@11a5357org.highwire.dtl.DTLVardef@15586c2org.highwire.dtl.DTLVardef@2357f_HPS_FORMAT_FIGEXP M_FIG C_FIG HighlightsO_LIOmicLearn is an open-source platform allows researchers to apply machine learning (ML) for biomarker discovery C_LIO_LIThe ready-to-use structure of OmicLearn enables accessing state-of-the-art ML algorithms without requiring any prior bioinformatics knowledge C_LIO_LIOmicLearns web-based interface provides an easy-to-follow platform for classification and gaining insights into the dataset C_LIO_LISeveral algorithms and methods for preprocessing, feature selection, classification and cross-validation of omics datasets are integrated C_LIO_LIAll results, settings and method text can be exported in publication-ready formats C_LI