Available only for arXiv papers.
Taxonomic classification of metagenomic reads is a well-studied yet challenging problem. Identifying species belonging to ranks without close representation in a reference dataset are in particular challenging. While k-mer-based methods have performed well in terms of running time and accuracy, they have reduced accuracy for novel species. Here, we show that using locality-sensitive hashing (LSH) can increase the sensitivity of the k-mer-based search. Our method, which combines LSH with several heuristics techniques including soft LCA labeling and voting is more accurate than alternatives in both taxonomic classification and profiling.