Science Cast

Scan Cluster: A versatile database-independent prediction tool for multi-genome identification of homologous gene clusters.

Mauricio LozanoMay 2, 2026 2:56am

Views (3)
Comments (0)

Export Citation

Voice is AI-generated

Connected to paperThis paper is a preprint and has not been certified by peer review

Scan Cluster: A versatile database-independent prediction tool for multi-genome identification of homologous gene clusters.

bioRxivPDFMay 1, 2026 12:00am

Authors

Mogro, E. G.; Pagnutti, A. L.; Zapata, G.; Lozano, M. J.

Abstract

The exploration of colocalized gene sets, such as Biosynthetic Gene Clusters (BGCs) and symbiotic islands, is fundamental in modern genome mining. However, many existing prediction tools rely heavily on curated databases or predefined rules, inherently biasing detection toward known clusters. To address these limitations, we introduce Scan Cluster, a robust and flexible Python-based bioinformatic tool designed to identify user-defined, conserved gene clusters across diverse genomes without database-driven constraints. Scan Cluster leverages BLAST or HMMER to detect homologous proteins and evaluates their colocalization, accommodating complex evolutionary events such as orientation inversions, the insertion of alien genes, gene deletions, and the integration of insertion sequences. Beyond identification, the software performs progressive multiple cluster alignments and generates distance trees to assess cluster similarity, producing outputs ready for visualization in iTOL and Clinker. We validated Scan Cluster against standard tools like antiSMASH and DeepBGC, demonstrating high accuracy in delineating complete cluster boundaries. Its utility was further confirmed through the analysis of the nos and complex nod-nif-fix symbiotic gene clusters in rhizobia strains, successfully tracking genetic decay and grouping diverse architectures. Scan Cluster provides an accessible, low-resource framework to explore the evolutionary and functional dynamics of novel genetic clusters.

TwitterandLinkedIn

0 comments

Add comment

Scan Cluster: A versatile database-independent prediction tool for multi-genome identification of homologous gene clusters.

Scan Cluster: A versatile database-independent prediction tool for multi-genome identification of homologous gene clusters.

AI-powered Paper ChatBeta

AI-powered Paper ChatBeta

0 comments