DIANA: Deep Learning Identification and Assessment of Ancient DNA
DIANA: Deep Learning Identification and Assessment of Ancient DNA
Duitama Gonzalez, C.; Lopopolo, M.; Nishimura, L.; Faure, R.; Duchene, S.
AbstractThe field of ancient metagenomics provides insights into past microbiomes, but with a growing dataset size, methods that rely on reference databases have limited scope. Here, we introduce DIANA, a multi-task neural network that predicts key metadata categories from unitig abundances. Trained on 2,597 run accessions (1.72~Tbp of assembled unitig sequences), DIANA accurately identifies sample host (94.6%), community type (90.0%), and material (88.9%) on held-out test data and demonstrates robust generalisation on an independent validation set. A key innovation is DIANA's ability to perform semantic generalisation, correctly classifying samples with labels unseen during training -- such as novel subspecies -- to their appropriate parent categories. By leveraging both known and uncharacterized genomic sequences, DIANA provides a rapid, data-driven system for metadata validation and quality control, accelerating discovery in ancient metagenomics research.