CLM-X: A multimodal single-cell foundation model with flexible multi-way Transformer for unified scRNA-seq and scATAC-seq analysis

Avatar
Poster
Voice is AI-generated
Connected to paperThis paper is a preprint and has not been certified by peer review

CLM-X: A multimodal single-cell foundation model with flexible multi-way Transformer for unified scRNA-seq and scATAC-seq analysis

Authors

Li, B.; Liu, Z.; Wang, Z.; Xu, Z.; Li, Y.; Sha, C.; Li, X.

Abstract

Advances in single-cell multimodal profiling have enabled a more systematic analysis of cellular biology, yet the rapid accumulation of large-scale, heterogeneous datasets poses substantial challenges for integrative analysis. Recently, Transformer-based cell language models (CLMs) are becoming powerful foundational tools for learning transferable cell representations from unimodal single-cell datasets. However, a flexible and unified multimodal foundation models for joint modeling of scRNA-seq and scATAC-seq datasets remains lacking. Here, we present CLM-X, a multimodal single-cell foundation model built on multiway Transformer architecture. CLM-X employs a harmonized tokenization design together with a stage-wise masked reconstruction pretraining strategy, enabling unified modeling of RNA-only, ATAC-only, and paired RNA-ATAC-paired input within a single Transformer-based framework. We pretrain CLM-X on million-scale unimodal and multimodal datasets, and systematically evaluate its transferability on five downstream tasks including batch correction, modality integration, cross-modal translation, cell type annotation, and perturbation prediction. Across comprehensive benchmarks on 10 datasets, CLM-X consistently outperforms existing multimodal methods and unimodal foundation models, with particularly clear advantages in RNA-ATAC cross-modal translation and genetic-perturbation-response prediction. Overall, CLM-X establishes a unified and scalable multimodal foundation model for integrative analysis of scRNA-seq and scATAC-seq datasets, advancing a more robust, comprehensive, and biological interpretable single-cell analysis beyond task-specific approaches and unimodal foundation models.

Follow Us on

0 comments

Add comment