Sequencing the gaps: dark genomic regions persist in CHM13 despite long-read advances

Avatar
Poster
Voice is AI-generated
Connected to paperThis paper is a preprint and has not been certified by peer review

Sequencing the gaps: dark genomic regions persist in CHM13 despite long-read advances

Authors

Wadsworth, M. E.; Page, M. L.; Aguzzoli Heberle, B.; Miller, J. B.; Steely, C.; Ebbert, M. T. W.

Abstract

Comprehensive genomic analysis is essential for advancing our understanding of human genetics and disease. However, short-read sequencing technologies are inherently limited in their ability to resolve highly repetitive, structurally complex, and low-mappability genomic regions, previously coined as \"dark\" regions. Long-read sequencing technologies, such as PacBio and Oxford Nanopore Technologies (ONT), offer improved resolution of these regions, yet they are not perfect. With the advent of the new Telomere-to-Telomere (T2T) CHM13 reference genome, exploring its effect on dark regions is prudent. In this study, we systematically analyze dark regions across four human genome references (HG19, HG38 with and without alternate contigs, and CHM13) using both short- and long-read sequencing data. We found that dark regions increase as the reference becomes more complete, especially dark-by-MAPQ regions, but that long-read sequencing significantly reduces the number of dark regions in the genome, particularly within gene bodies. However, we identify potential alignment challenges in long-read data, such as centromeric regions. These findings highlight the importance of both reference genome selection and sequencing technology choice in achieving a truly comprehensive genomic analysis.

Follow Us on

0 comments

Add comment