Back to BERT in 2026: ModernGENA as a Strong, Efficient Baseline for DNA Foundation Models

Avatar
Poster
Voice is AI-generated
Connected to paperThis paper is a preprint and has not been certified by peer review

Back to BERT in 2026: ModernGENA as a Strong, Efficient Baseline for DNA Foundation Models

Authors

Aspidova, A.; Kuratov, Y.; Shadskiy, A.; Burtsev, M.; Fishman, V.

Abstract

Recent advances in DNA language models have mainly come from building larger and more complex architectures, making it harder to understand the effect of changes to standard components such as the transformer layers widely used in NLP. In this work, we study whether and how a modernized BERT-style backbone (ModernBERT) can be adapted to genomic sequence modeling to improve computational efficiency, training stability, and downstream performance. Under controlled experimental settings, we benchmark efficiency across a range of sequence lengths and evaluate downstream performance on the Nucleotide Transformer benchmark. The resulting model, ModernGENA, achieves a strong efficiency-quality trade-off and ranks among the top-performing models in our evaluation suite. To support reproducibility and provide a solid default reference point for future architectural work in genomics, we release the full implementation and configuration of ModernGENA as an open, reusable baseline, and make ModernGENA base and ModernGENA large publicly available through the DNA language models collection on Hugging Face.

Follow Us on

0 comments

Add comment