Predicting Phenotypic Traits Using a Massive RNA-seq Dataset

Avatar
Poster
Voice is AI-generated
Connected to paperThis paper is a preprint and has not been certified by peer review

Predicting Phenotypic Traits Using a Massive RNA-seq Dataset

Authors

Hadish, J. A.; Honaas, L. A.; Ficklin, S. P.

Abstract

Transcriptomic data can be used to predict environmentally impacted phenotypic traits. This type of prediction is particularly useful for monitoring difficult-to-measure phenotypic traits and has become increasingly popular for monitoring high-value agricultural crops and in precision medicine. Despite this increase in popularity, little research has been done on how many samples are required for these models to be accurate, and which normalization should be used. Here we create a massive RNA-seq dataset from publicly available Arabidopsis thaliana data with corresponding measurements for age and tissue type. We use this dataset to determine how many samples are required for accurate model prediction and which normalization method is required. We find that Median Ratios Normalization significantly increases performance when predicting age. We also find that in the case of our dataset, only a few hundred samples are required to predict tissue types, and only a few thousand samples are necessary to accurately predict age. Researchers should consider these results when choosing the number of samples in a transcriptomic experiment and during data-processing.

Follow Us on

0 comments

Add comment