A conceptual framework for revealing minor bacterial signals inmicrobiome data through guided data transformation
A conceptual framework for revealing minor bacterial signals inmicrobiome data through guided data transformation
MARTIN, D.; Houedry, P.; Derbre, F.; Monbet, V.
AbstractThe microbiome is a rich source of biological data that offers promising insights into personalized medicine. However, inferring host health from gut bacterial composition using statistical analytical methods remains a challenge. Here, we show that groups of bacterial species with high abundance and variance (referred to as dominant bacterial signals and often associated with enterotype) exert a disproportionately large influence on microbiome analyses, hiding the contribution of less expressed species (referred to as minor bacterial signals). To address this limitation, we propose a guided data transformation highlighting minor bacterial signals while minimizing the impact of dominant bacterial signals on microbiome statistical analyses. This transformation (i) leads to alternative clustering more closely associated with host health and (ii) helps to improve the performance of supervised machine learning algorithms in high-dimensional settings (n << p). Applying to a real dataset, our results suggest that dominant bacterial signals may act as a confounding variable to predict host health.