Binning meets taxonomy: TaxVAMB improves metagenome binning using bi-modal variational autoencoder

Avatar
Poster
Voice is AI-generated
Connected to paperThis paper is a preprint and has not been certified by peer review

Binning meets taxonomy: TaxVAMB improves metagenome binning using bi-modal variational autoencoder

Authors

Kutuzova, S.; Piera Lindez, P.; Nor Nielsen, K.; S. Olsen, N.; Riber, L.; Gobbi, A.; Forero Junco, L. M.; Erdmann Dougherty, P.; Cairo Westergaard, J.; Christensen, S.; Hestbjerg Hansen, L.; Nielsen, M.; Nybo Nissen, J.; Rasmussen, S.

Abstract

A common procedure for studying the microbiome is binning the sequenced contigs into metagenome-assembled genomes. Currently, unsupervised and self-supervised deep learning based methods using co-abundance and sequence based motifs such as tetranucleotide frequencies are state-of-the-art for metagenome binning. Taxonomic labels derived from alignment based classification have not been widely used. Here, we propose TaxVAMB, a metagenome binning tool based on semi-supervised bi-modal variational autoencoders, combining tetranucleotide frequencies and contig co-abundances with contig annotations returned by any taxonomic classifier on any taxonomic rank. TaxVAMB outperforms all other binners on CAMI2 human microbiome datasets, returning on average 40% more near-complete assemblies than the next best binner. On real long-read datasets TaxVAMB recovers on average 13% more near-complete bins and 14% more species. When used in a single-sample setup, TaxVAMB on average returns 83% more high quality bins than VAMB. TaxVAMB bins incomplete genomes drastically better than any other tool, returning 255% more high quality bins of incomplete genomes than the next best binner. Our method has immediate research and industrial applications, as well as methodological novelty which can be translated to other biological problems with semi-supervised multimodal datasets.

Follow Us on

0 comments

Add comment