Machine learning cross-platform proteomic imputation enables protein quality scoring and replication of epidemiological associations

Avatar
Poster
Voice is AI-generated
Connected to paperThis paper is a preprint and has not been certified by peer review

Machine learning cross-platform proteomic imputation enables protein quality scoring and replication of epidemiological associations

Authors

Li, L.; Alaa, A.; Tan, Y.; Demirel, I.; Friedman, S.; Zha, Q.; Trac, R. P.; Taylor, K. D.; Yu, B.; Ballantyne, C. M.; Deo, R.; Dubin, R.; Tsai, M. Y.; Peloso, G. M.; Brody, J.; Austin, T.; Psaty, B. M.; Nicholas, J.; Raffield, L. M.; Tahir, U.; Coresh, J.; Hornsby, W.; Chan, A.; Rich, S. S.; Rotter, J. I.; Ganz, P.; Gerszten, R.; Philippakis, A.; Natarajan, P.; Yu, Z.

Abstract

High-throughput affinity-based proteomics has advanced biomedical research, yet fundamental, persistent discordance between mainstream platforms (SomaScan and Olink) routinely undermines the replication of findings. This platform-driven non-replication complicates downstream biological validation and biomarker prioritization. Here, we develop a machine learning-based framework for cross-platform protein value imputation to resolve this translational bottleneck. Using paired proteomic data measured by both SomaScan and Olink from 5,325 participants of the Multi-Ethnic Study of Atherosclerosis, we developed models to impute cross-platform measurements and applied them to two independent and demographically distinct cohorts (Cardiovascular Health Study [N=3,171] and UK Biobank [UKB; N=41,405]) for external validation. Our bi-directional model 1) established an imputation performance-based protein fidelity index, validated against gold-standard measurements from Atherosclerosis Risk in Communities study (N=101) and Nurses' Health Study (N=54), 2) enabled imputation of platform-exclusive protein measurements, and 3) facilitated calibration of overlapping proteins. We demonstrate the utility of this framework through three applications: 1) fidelity-informed analyses enhanced the replication of biomarker discovery, 2) recovery of SomaScan signals that were previously inaccessible in UKB's original Olink measurements, and 3) improved replication performance for overlapping proteins. Our study offers a translational roadmap that allows researchers to achieve reliable epidemiological replication, target specific assays for future optimization, and prioritize biological signal over platform noise.

Follow Us on

0 comments

Add comment