Science Cast

Fast barcode calling based on k-mer distances

Riko Corwin UphoffMay 17, 2025 2:56pm

Views (3)
Comments (0)

Export Citation

Voice is AI-generated

Connected to paperThis paper is a preprint and has not been certified by peer review

Fast barcode calling based on k-mer distances

bioRxivPDFMay 17, 2025 12:00am

Authors

Uphoff, R. C.; Schueler, S.; Grosse, I.; Mueller-Hannemann, M.

Abstract

DNA barcodes, which are short DNA strings, are regularly used as tags in pooled sequencing experiments to enable the identification of reads originating from the same sample. A crucial task in the subsequent analysis of pooled sequences is barcode calling, where one must identify the corresponding barcode for each read. This task is computationally challenging when the probability of synthesis and sequencing errors is high, like in photolithographic microarray synthesis. Identifying the most similar barcode for each read is a theoretically attractive solution for barcode calling. However, an all-to-all exact similarity calculation is practically infeasible for applications with millions of barcodes and billions of reads. Hence, several computational approaches for barcode calling have been proposed, but the challenge of developing an efficient and precise computational approach remains. Here, we propose a simple, yet highly effective new barcode calling approach that uses a filtering technique based on precomputed k-mer lists. We find that this approach has a slightly higher accuracy than the state-of-the-art approach, is more than 500 times faster than that, and allows barcode calling for one million barcodes and one billion reads per day on a server GPU.

TwitterandLinkedIn

0 comments

Add comment

Fast barcode calling based on k-mer distances

Fast barcode calling based on k-mer distances

AI-powered Paper ChatBeta

AI-powered Paper ChatBeta

0 comments