From Hype to Health Check: Critical Evaluation of Drug Response Prediction Models with DrEval

Avatar
Poster
Voice is AI-generated
Connected to paperThis paper is a preprint and has not been certified by peer review

From Hype to Health Check: Critical Evaluation of Drug Response Prediction Models with DrEval

Authors

Bernett, J.; Iversen, P.; Picciani, M.; Wilhelm, M.; Baum, K.; List, M.

Abstract

Motivation: Large-scale drug sensitivity screens have enabled training drug response prediction models based on cancer cell line omics profiles to facilitate personalized medicine. While model performances reported in the literature appear promising, no successful translation to the clinic has been reported. Results: We identify six primary obstacles to this: non-reproducible models, data leakage leading to poor generalization, pseudoreplication, biased evaluation metrics, missing ablation studies, and inconsistent viability data reducing comparability across models. Together, these issues lead to overly optimistic performance estimates of state-of-the-art models and make it challenging to track progress in the field. To address this, we present DrEval, a pipeline for unbiased and biologically meaningful evaluation of cancer drug response models. It includes baseline and literature models with consistent hyperparameter tuning, statistically sound evaluations, and cross-study benchmarks. DrEval enables ablation studies and publication-ready visualizations. It allows researchers to focus on model development without implementing their own evaluation protocol. We find that deep learning models barely outperform a naive model predicting the mean drug and cell line effects, while no complex model significantly outperforms properly tuned tree-based ensemble baselines in relevant settings. We advocate making our pipeline a standard benchmark for cancer drug response prediction, ensuring a clinically relevant and robust assessment. Availability and implementation: DrEval consists of a Python package, available on PyPI (drevalpy) and GitHub (github.com/daisybio/drevalpy), and an accompanying nf core pipeline (github.com/nf-core/drugresponseeval). All data is available on Zenodo (DOI: 10.5281/zenodo.12633909), preprocessing scripts on github.com/daisybio/preprocess_drp_data.

Follow Us on

0 comments

Add comment