Molecular Translators as a Computational Primitive for Biomarker Discovery: Learnability Gains Under Conserved Information Ceilings
Molecular Translators as a Computational Primitive for Biomarker Discovery: Learnability Gains Under Conserved Information Ceilings
Saisan, P. A.; Patel, S. P.
AbstractVirtual molecular mapping systems such as MISO and GigaTIME introduce a potentially transformative primitive in computational pathology: translation of H\&E whole-slide images into biologically structured molecular representations, learned on paired cohorts and deployed as an inference-time map. Despite sustained progress in machine learning, H\&E-to-molecular-biomarker (e.g., gene mutation) prediction continues to exhibit recurrent field-level performance plateaus whose drivers remain poorly resolved. It remains unclear whether continued optimization targets a removable methodological limitation or instead presses against an intrinsic ceiling imposed by morphology. We develop a formal framework characterizing what deterministic translators can and cannot change. Histology-based biomarker modeling is governed by two constraints: method-limited gaps (finite labels, weak supervision, structured nuisance) and modality-limited ceilings (intrinsic slide-specific information in morphology). Because deterministic translation introduces no new slide-level measurements at inference, H\&E information ceilings are conserved; however, translation can still improve finite-sample learnability, yielding an apparent information--performance paradox that we formalize as learnability gains under conserved information ceilings. We derive falsifiable signatures distinguishing these regimes and characterize them in controlled analytical experiments anchored to representative systems, including MISO and GigaTIME. We introduce an open-source toolkit comprising learning regime diagnosis, information-ceiling estimations, phase analyses, fidelity perturbation tests, and shortcut-confounding stress tests as an operational rubric for identifying and overcoming removable performance plateaus in translator-assisted molecular biomarker discovery and computational pathology.