TxConformal: Controlling False Discoveries in AI-Driven Therapeutic Discovery
TxConformal: Controlling False Discoveries in AI-Driven Therapeutic Discovery
Jin, Y.; Huang, K.; Diamant, N.; Buchholz, K. R.; Rutherford, S. T.; Skelton, N.; Biancalani, T.; Scalia, G.; Leskovec, J.; Candes, E. J.
AbstractArtificial Intelligence (AI) is transforming therapeutic discovery by scoring a large set of promising candidates and prioritizing a shortlist for further investigation. Quantifying the reliability of AI scores and preventing false positives among selected candidates is key to the efficiency of the discovery process. Conformal prediction (CP) has emerged as a popular tool for guiding such prioritization, especially via the conformal selection framework to control false discovery rates (FDR) in selecting top-ranked candidates under distributional shift. However, deploying these advances in real-world therapeutic discovery remains challenging: distribution shifts are difficult to quantify and correct in high-dimensional biomedical data, and practical workflows often require flexible error metrics. Here, we present TxConformal, a general framework for trustworthy decision making when building shortlists using AI scores. TxConformal adjusts for distribution shift by balancing the hidden representations in AI models and then provides confidence measures for true discoveries of target biological properties. These confidence measures, interpretable as p-values, can be used in conjunction with statistical multiple testing procedures to derive selection decisions with limited false positives or to estimate the errors in given selection decisions. TxConformal controls the false positive rate in six real-world tasks spanning various therapeutic discovery stages, modalities, and AI models with realistic data splits. When selecting promising combinatorial genetic perturbations, TxConformal nearly halves false-positive selections compared to baseline methods, substantially reducing unnecessary experimental costs by tens of thousands of dollars. When selecting stable protein structures under mutant shifts, TxConformal identifies about 10 times more proteins than baseline methods at stringent thresholds when running at a target FDR level of 10\%, recovering over 90\% of valuable candidates that baseline methods miss due to unaccounted distribution shifts. Furthermore, we demonstrate that TxConformal robustly supports various alternative error metrics suitable for resource-constrained settings. Finally, in a prospective fixed-budget virtual screening campaign for novel antibiotic discovery, TxConformal predicted false positives in close agreement with experimental outcomes, with substantial improvements over simple baselines.