Semi-supervised Omics Factor Analysis (SOFA) disentangles known sources of variation from latent factors in multi-omics data
Semi-supervised Omics Factor Analysis (SOFA) disentangles known sources of variation from latent factors in multi-omics data
Capraz, T.; Vöhringer, H. S.; Huber, W.
AbstractGroup Factor Analysis is a family of methods for representing patterns of correlation between features in tabular data1. Argelaguet et al. identify latent factors within and across modalities2. Often, some factors align with known covariates, and currently, such alignment is done post hoc. We present Semi-supervised Omics Factor Analysis (SOFA), a method that incorporates known sources of variation into the model and focuses the latent factor discovery on novel sources of variation. We apply it to a pan-gynecologic multi-omics data set from The Cancer Genome Atlas (TCGA), where we guide the model with cancer type labels and discover an independent factor representing an immune infiltration vs proliferation transition axis. The inferred factor is predictive of treatment outcomes. We further use SOFA to identify microglial subpopulations during adolescence associated with cell migration and inflammatory response in a single-cell multi-omics data set (RNA- and ATAC-seq) from the human cerebral cortex. SOFA simplifies the discovery of novel patterns and structures in multi-omics data.