scLKME: A Landmark-based Approach for Generating Multi-cellular Sample Embeddings from Single-cell Data
scLKME: A Landmark-based Approach for Generating Multi-cellular Sample Embeddings from Single-cell Data
Yi, H.; Stanley, N.
AbstractSingle-cell technologies enable high-dimensional profiling of individual cells, therefore offering profound insights into subtle variation between specialized cell-types. However, translating the multitude of nuanced cellular profiles into meaningful per-sample representations is challenging due to heterogeneous cellular composition across individual profiled samples. To compute informative per-sample representations, we developed scLKME, a novel approach that uses a landmark-based kernel mean embedding method to convert multi-sample single-cell data into compact per-sample embeddings. Treating each sample as a distribution over cells, scLKME identifies landmarks across samples and maps these distributions into a reproducing kernel Hilbert space. Overall, scLKME outperforms state-of-the-art techniques in robustness, efficiency, accuracy, and practical usefulness of sample embeddings. Its application on a CyTOF dataset profiling immune responses in pre-term birth highlighted its capacity to accurately identify patient-specific variations correlating with gestational age, suggesting broad applicability to multi-sample single-cell datasets with complex experimental designs. scLKME is available as an open-sourced python package at https://github.com/CompCy-lab/scLKME.