Science Cast

Glitch genes: embedding geometry predicts functional fragility in single-cell foundation models

Justin WhalleyJune 28, 2026 6:56am

Views (2)
Comments (0)

Export Citation

Voice is AI-generated

Connected to paperThis paper is a preprint and has not been certified by peer review

Glitch genes: embedding geometry predicts functional fragility in single-cell foundation models

bioRxivPDFJune 27, 2026 12:00am

Authors

Whalley, J. P.

Abstract

Background: Single-cell foundation models are increasingly used for perturbation prediction and gene network inference, but their learned gene representations are rarely audited directly. In natural language processing, geometric analyses of token embeddings have revealed anomalous "glitch tokens" associated with erratic model behaviour. Whether analogous representational anomalies exist in biological foundation models remains unknown. Results: This study introduces a weight-only geometric audit framework that scores genes by embedding norm, centroid distance, cosine similarity, and isolation to identify representational outliers. Applied to Geneformer, scGPT, and scFoundation, the analysis identifies hundreds of outliers in discrete-tokenisation models. Shared Geneformer-scGPT outliers are enriched for loss-of-function intolerance (OR=12.0) and disease association (OR=3.7), whereas scFoundation's continuous value embeddings form a near-isotropic space with no detectable enrichment under the annotation panels tested. In Geneformer, geometric anomaly predicts perturbation sensitivity ( {rho} =0.725); the signal is supported by mask-in-place experiments, shows rank agreement in real PBMC cells, and correlates with Replogle perturb-seq effect sizes ( {rho} =0.645). Metric decomposition separates magnitude-driven outliers, enriched for highly expressed housekeeping genes, from isolation-driven outliers enriched for tissue-restricted genes. Conclusions: Tokenisation strategy helps determine which genes are represented reliably. Embedding geometry provides a rapid, model-agnostic diagnostic that requires only an embedding matrix and can flag genes whose representations warrant caution before downstream use.

TwitterandLinkedIn

0 comments

Add comment

Glitch genes: embedding geometry predicts functional fragility in single-cell foundation models

Glitch genes: embedding geometry predicts functional fragility in single-cell foundation models

AI-powered Paper ChatBeta

AI-powered Paper ChatBeta

0 comments