Science Cast

Back to sequences: find the origin of kmers

Pierre PeterlongoOctober 30, 2023 1:56am

Views (39)
Comments (0)

Export Citation

Voice is AI-generated

Connected to paperThis paper is a preprint and has not been certified by peer review

Back to sequences: find the origin of kmers

bioRxivPDFOctober 29, 2023 12:00am

Authors

Baire, A.; Peterlongo, P.

Abstract

A vast majority of bioinformatics tools dedicated to the treatment of raw sequencing data heavily use the concept of kmers. This enables us to reduce the data redundancy (and thus the memory pressure), to discard sequencing errors, and to dispose of objects of fixed size that can be manipulated and easily compared to others. A drawback is that the link between each kmer and the original set of sequences it belongs to is generally lost. Given the volume of data considered in this context, finding back this association is costly. In this work, we present \'\'back_to_sequences\'\', a simple tool designed to index a set of kmers of interests, and to stream a set of sequences, extracting those containing at least one of the indexed kmer. In addition, the number of occurrences of kmers in the sequences is provided. Our results show that back_to_sequences streams ~200 short read per millisecond, enabling to search kmers in hundreds of millions of reads in a matter of a few minutes.

TwitterandLinkedIn

0 comments

Add comment

Back to sequences: find the origin of kmers

Back to sequences: find the origin of kmers

AI-powered Paper ChatBeta

AI-powered Paper ChatBeta

0 comments