Deep Learning-Based Classification of CRISPR Loci Using Repeat Sequences

Avatar
Poster
Voices Powered byElevenlabs logo
Connected to paperThis paper is a preprint and has not been certified by peer review

Deep Learning-Based Classification of CRISPR Loci Using Repeat Sequences

Authors

Liao, X.; Li, Y.; Wu, Y.; Li, X.; Shang, X.

Abstract

With the widespread application of the CRISPR-Cas system in gene editing and related fields, the demand for detecting and classifying CRISPR-Cas systems in metagenomic data has continuously increased. The traditional classification of the CRISPR-Cas system mainly relies on identifying neighboring cas genes of repeats. However, in some cases where there is a lack of information about cas genes, such as in metagenomes and fragmented genome assemblies, traditional classification methods may become ineffective. Here, we introduce a deep learning-based method called CRISPRclassify-CNN-Att, which classifies CRISPR-Cas systems solely based on repeat sequences. CRISPRclassify-CNN-Att utilizes convolutional neural networks (CNNs) and self-attention mechanisms to extract features from repeat sequences. It employs a stacking strategy to handle sample imbalances across different subtypes and improves classification accuracy for subtypes with fewer samples through transfer learning. CRISPRclassify-CNN-Att demonstrates excellent performance in classifying multiple subtypes, particularly in subtypes with a larger number of samples. Although CRISPR loci classification primarily relies on cas genes, CRISPRclassify-CNN-Att offers a new approach as a significant complement to current methods. It can identify unclassified loci missed by traditional cas-based methods, breaking the limitations of traditional approaches, and simplifying the classification process. The proposed tool is freely accessible via https://github.com/Xingyu-Liao/CRISPRclassify-CNN-Att .

Follow Us on

0 comments

Add comment