Scan Cluster: A versatile database-independent prediction tool for multi-genome identification of homologous gene clusters.

Avatar
Poster
Voice is AI-generated
Connected to paperThis paper is a preprint and has not been certified by peer review

Scan Cluster: A versatile database-independent prediction tool for multi-genome identification of homologous gene clusters.

Authors

Mogro, E. G.; Pagnutti, A. L.; Zapata, G.; Lozano, M. J.

Abstract

The exploration of colocalized gene sets, such as Biosynthetic Gene Clusters (BGCs) and symbiotic islands, is fundamental in modern genome mining. However, many existing prediction tools rely heavily on curated databases or predefined rules, inherently biasing detection toward known clusters. To address these limitations, we introduce Scan Cluster, a robust and flexible Python-based bioinformatic tool designed to identify user-defined, conserved gene clusters across diverse genomes without database-driven constraints. Scan Cluster leverages BLAST or HMMER to detect homologous proteins and evaluates their colocalization, accommodating complex evolutionary events such as orientation inversions, the insertion of alien genes, gene deletions, and the integration of insertion sequences. Beyond identification, the software performs progressive multiple cluster alignments and generates distance trees to assess cluster similarity, producing outputs ready for visualization in iTOL and Clinker. We validated Scan Cluster against standard tools like antiSMASH and DeepBGC, demonstrating high accuracy in delineating complete cluster boundaries. Its utility was further confirmed through the analysis of the nos and complex nod-nif-fix symbiotic gene clusters in rhizobia strains, successfully tracking genetic decay and grouping diverse architectures. Scan Cluster provides an accessible, low-resource framework to explore the evolutionary and functional dynamics of novel genetic clusters.

Follow Us on

0 comments

Add comment