Benchmarking DNA Foundation Models: Biological Blind Spots in Evo2 Variant-Effect Prediction

Avatar
Poster
Voice is AI-generated
Connected to paperThis paper is a preprint and has not been certified by peer review

Benchmarking DNA Foundation Models: Biological Blind Spots inEvo2 Variant-Effect Prediction

Authors

Mathur, V.; Sachidanandam, R.

Abstract

DNA foundation models such as Evo and DNABERT-2 have generated considerable interest for clinically relevant genomics applications, particularly variant-effect prediction (VEP). However, rigorous benchmarks tailored to assessing their understanding of known biological constraints remain limited. Here, we develop controlled evaluation metrics based on well-characterized nuclear and mitochondrial sequence features and curated variant sets. Applying these benchmarks to Evo2, we identify systematic blind spots in short-range biological signals (e.g. codon usage bias) and observe sensitivity to contextual features that should be biologically neutral. These findings challenge current claims of zero-shot pathogenicity prediction and raise concerns regarding the clinical readiness of such models. The bench-marking framework introduced here provides a foundation for improving training strategies and for standardized evaluation of future genomic foundation models.

Follow Us on

0 comments

Add comment