Gaining Biological Insights through Supervised Data Visualization

Avatar
Poster
Voice is AI-generated
Connected to paperThis paper is a preprint and has not been certified by peer review

Gaining Biological Insights through Supervised Data Visualization

Authors

Rhodes, J. S.; Aumon, A.; Morin, S.; Girard, M.; Larochelle, C.; Lahav, B.; Brunet-Ratnasingham, E.; Zhang, W.; Cutler, A.; Zhou, A.; Kaufmann, D. E.; Zandee, S.; Prat, A.; Wolf, G.; Moon, K. R.

Abstract

Dimensionality reduction-based data visualization is pivotal in comprehending complex biological data. The most common methods, such as PHATE, t-SNE, and UMAP, are unsupervised and therefore reflect the dominant structure in the data, which may be independent of expert-provided labels. Here we introduce a supervised data visualization method called RF-PHATE, which integrates expert knowledge for further exploration of the data. RF-PHATE leverages random forests to capture intricate feature-label relationships. Extracting information from the forest, RF-PHATE generates low-dimensional visualizations that highlight relevant data relationships while disregarding extraneous features. This approach scales to large datasets and applies to classification and regression. We illustrate RF-PHATE\'s prowess through three case studies. In a multiple sclerosis study using longitudinal clinical and imaging data, RF-PHATE unveils a sub-group of patients with non-benign relapsing-remitting Multiple Sclerosis, demonstrating its aptitude for time-series data. In the context of Raman spectral data, RF-PHATE effectively showcases the impact of antioxidants on diesel exhaust-exposed lung cells, highlighting its proficiency in noisy environments. Furthermore, RF-PHATE aligns established geometric structures with COVID-19 patient outcomes, enriching interpretability in a hierarchical manner.

Follow Us on

0 comments

Add comment