rhapsodist: a reproducible Snakemake workflow for BD~Rhapsody single-cell RNA-seq data
rhapsodist: a reproducible Snakemake workflow for BD~Rhapsody single-cell RNA-seq data
Moro, G. A.; Wang, J.; Sendoel, A.; Mallona, I.
AbstractBD Rhapsody is a widely used single-cell RNA-seq (scRNA-seq) platform to profile gene expression. Based on barcoded beads, processing BD Rhapsody experiments is demanding because of the multiple barcode versions and layouts, which makes it difficult to locate barcodes by scanning reads positionally. We introduce rhapsodist, a reproducible and scalable Snakemake workflow to process BD Rhapsody whole transcriptome analysis (WTA) data from raw FASTQ reads. As a result, rhapsodist generates count tables in a HDF5-backed SingleCellExperiment format. It supports all generations of BD Rhapsody beads, regardless of the barcode allowlist version and the presence of a variable-length diversity inset. rhapsodist orchestrates STARsolo, kallisto/bustools, salmon/alevin, and the official BD Rhapsody CWL pipeline, enabling direct cross-aligner comparison. Optionally, sample tag demultiplexing, read downsampling, tunable linker-mismatch tolerance, and per-sample reports are provided. We showcase a remarkable consistency across aligner results, and highlight their differences in performance (e.g., quantification, clustering, or speed). On simulated data all four aligners recovered true barcodes with perfect precision and recall and equivalent UMI counts. On HeLa cells (a cell line; enhanced beads), both per-cell correlations across aligners and pseudobulk correlations were remarkably high on every comparison (Pearson r of 0.89--0.96 and 0.94--0.98 per cell and pseudobulk, respectively). On a mouse epidermis dataset with a more complex transcriptional profile (e.g., with celltypes; on legacy beads), cell type pseudobulk correlations showed a similar consistency (Pearson r 0.92--0.98), pointing to calling robustness. As expected, pseudoaligners (kallisto/bustools and salmon/alevin) were best performers speed-wise, and BD's workflow recovered a higher number of barcodes and UMIs.