Available only for arXiv papers.
Sketching methods provide scalable solutions for analyzing rapidly growing genomic data. A recent innovation in sketching methods, syncmers, has proven effective and has been employed for read alignment. Syncmers share fundamental features with the FracMinHash technique, a recent modification of the popular MinHash algorithm for set similarity estimation between sets of different sizes. While previous researchers have demonstrated the effectiveness of syncmers in read alignment, their potential for use in genomic analysis (for which FracMinHash was designed) has not been fully realized. We demonstrate that the open syncmer sketch is equivalent to a FracMinHash sketch when applying to k-mer-based similarities, yet it exhibits superior distance distribution and genomic coverage. Moreover, we can expand the concept of k-mer truncation to open syncmers, enabling multi-resolution estimation in metagenomics as well as flexible-sized seeding for sequence comparisons.