OPLS-based Multiclass Classification and Data-Driven Inter-Class Relationship Discovery
OPLS-based Multiclass Classification and Data-Driven Inter-Class Relationship Discovery
Forsgren, E.; Bjorkblom, B.; Trygg, J.; Jonsson, P.
AbstractMulticlass datasets and large-scale studies are increasingly common in omics sciences, drug discovery, and clinical research due to advancements in analytical platforms. Efficiently handling these datasets and discerning subtle differences across multiple classes remains a significant challenge. In metabolomics, two-class OPLS-DA (Orthogonal Projection to Latent Structures Discriminant Analysis) models are widely used due to their strong discrimination capabilities and ability to provide interpretable information on class differences. However, these models face challenges in multiclass settings. A common solution is to transform the multiclass comparison into multiple two-class comparisons, which, while more effective than a global multiclass OPLS-DA model, unfortunately results in a manual, time-consuming model-building process with complicated interpretation. Here, we introduce an extension of OPLS-DA for data-driven multiclass classification: Orthogonal Partial Least Squares-Hierarchical Discriminant Analysis (OPLS-HDA). OPLS-HDA integrates Hierarchical Cluster Analysis (HCA) with the OPLS-DA framework to create a decision tree, addressing multiclass classification challenges and providing intuitive visualization of inter-class relationships. To avoid overfitting and ensure reliable predictions, we use cross-validation during model building. Benchmark results show that OPLS-HDA performs competitively across diverse datasets compared to eight established methods. This method represents a significant advancement, offering a powerful tool to dissect complex multiclass datasets. With its versatility, interpretability, and ease of use, OPLS-HDA is an efficient approach to multiclass data analysis applicable across various fields.