Ultrafast and accurate sequence alignment and clustering of viral genomes
Connected to paperThis paper is a preprint and has not been certified by peer review
Ultrafast and accurate sequence alignment and clustering of viral genomes
Zielezinski, A.; Gudys, A.; Barylski, J.; Siminski, K.; Rozwalak, P.; Dutilh, B. E.; Deorowicz, S.
AbstractViromics produces millions of viral genomes and fragments annually, overwhelming traditional sequence comparison methods. We introduce Vclust, a novel approach that determines average nucleotide identity by Lempel-Ziv parsing and clusters viral genomes with thresholds endorsed by authoritative viral genomics and taxonomy consortia. Vclust demonstrates superior accuracy and efficiency compared to existing tools, clustering millions of virus genomes in a few hours on a mid-range workstation.