Evaluating Limits of Machine Learning-Assisted Raman Spectroscopy in Classification of Biological Samples

Avatar
Poster
Voice is AI-generated
Connected to paperThis paper is a preprint and has not been certified by peer review

Evaluating Limits of Machine Learning-Assisted Raman Spectroscopy in Classification of Biological Samples

Authors

Yadav, A.; Birkby, A.; Armstrong, N.; Arnob, A.; Chou, M.-H.; Fernandez, A.; Verhoef, A. J.; Yi, Z.; Gulati, S.; Kotnis, S.; Sun, Q.; Kao, K. C.; Wu, H.-J.

Abstract

Machine learning (ML)-assisted Raman spectroscopy has become a powerful analytical tool for the classification and identification of analytes; however, technical challenges impacting its detection accuracy have not been investigated. This study explores experimental factors affecting classification performance. Among the evaluated ML models, ML algorithms show minimal impacts on classification accuracy. Instead, experimental factors, including spectral similarity between tested samples and the data quality, dominate detection performance. Increases in spectral noises and spectral similarity significantly reduce classification accuracy. In well-controlled samples with low experimental noise, ML-assisted Raman spectroscopy can discriminate lipid mixtures with a composition difference of 1.85 mol%. To assess the effect of biological heterogeneity, we analyzed single-cell Raman spectra from Saccharomyces cerevisiae strains carrying single, double, or triple gene mutations. Intrinsic cell-to-cell variability introduced substantial spectral differences, severely reducing the accuracy of multiclass classification of these genetically similar strains at the single-cell level. Averaging Raman spectra across multiple cells improved classification accuracy by reducing this spectral variability. We also assess the effectiveness of transfer learning across different Raman spectrometers, specifically by applying a ML model trained on one instrument to another Raman spectrometer. Transfer learning can be improved with proper instrument calibration, highlighting the importance of instrument standardization. Overall, our results demonstrate that data quality and spectral similarity are the primary bottlenecks in ML-assisted Raman spectroscopy. Careful attention to sample preparation, data acquisition, measurement conditions, and instrument calibration is critical to achieving robust and reliable classification performance.

Follow Us on

0 comments

Add comment