Science Cast

Tomtom-lite: Accelerating Tomtom enables large-scale and real-time motif similarity scoring

librarianMay 31, 2025 9:56pm

Views (3)
Comments (0)

Export Citation

Voice is AI-generated

Connected to paperThis paper is a preprint and has not been certified by peer review

Tomtom-lite: Accelerating Tomtom enables large-scale and real-time motif similarity scoring

bioRxivPDFMay 31, 2025 12:00am

Authors

Schreiber, J.

Abstract

Pairwise sequence similarity is a core operation in genomic analysis, yet most attention has been given to sequences made up of discrete characters. With the growing prevalence of machine learning, calculating similarities for sequences of continuous representations, e.g. frequency-based position-weight matrices (PWMs), attribution-based contribution-weight matrices, and even learned embeddings, is taking on newfound importance. Tomtom has previously been proposed as an algorithm for identifying pairs of PWMs whose similarity is statistically significant, but the implementation remains inefficient for both real-time and large-scale analysis. Accordingly, we have re-implemented Tomtom as a numba-accelerated Python function that is natively multi-threaded, avoids cache misses, more efficiently caches intermediate values, and uses approximations at compute bottlenecks. Here, we provide a detailed description of the original Tomtom method (see Supplementary Note 1) and present results demonstrating that our re-implementation can achieve over a thousand-fold speedup compared with the original tool on reasonable tasks (see Supplementary Note 2).

TwitterandLinkedIn

0 comments

Add comment

Tomtom-lite: Accelerating Tomtom enables large-scale and real-time motif similarity scoring

Tomtom-lite: Accelerating Tomtom enables large-scale and real-time motif similarity scoring

AI-powered Paper ChatBeta

AI-powered Paper ChatBeta

0 comments