Science Cast

Refining the cis-regulatory grammar learned by sequence-to-activity models by increasing model resolution

Sara MostafaviJanuary 28, 2025 2:41am

Views (6)
Comments (0)

Export Citation

Voice is AI-generated

Connected to paperThis paper is a preprint and has not been certified by peer review

Refining the cis-regulatory grammar learned by sequence-to-activity models by increasing model resolution

bioRxivPDFJanuary 27, 2025 12:00am

Authors

Chandra, N. A.; Hu, Y.; Buenrostro, J.; Mostafavi, S.; Sasse, A.

Abstract

Chromatin accessibility can be measured genome-wide with ATAC-seq, enabling the discovery of regulatory regions that control gene expression and determine cell type. Deep genomic sequence-to-function (S2F) models link underlying genomic sequences to the measured chromatin state and identify motifs that regulate chromatin accessibility. Previously, we developed AI-TAC, a S2F model that predicts chromatin accessibility across 81 immune cell types and identifies sequence patterns that control their differential ATAC-seq signals. While AI-TAC provided valuable insights into the regulatory patterns that govern immune cell differentiation, later research established that ATAC-seq profiles (the distribution of Tn5 cuts) contain additional information about the exact location and strength of TF binding. To make use of this additional information, we developed bpAI-TAC, a multi-task neural network which models ATAC-seq at base-pair resolution across 90 immune cell types. We show that adding ATAC-profile information consistently improves predictions of differential chromatin accessibility. We also demonstrate that simultaneous learning of related cell types through multi-task modeling leads to better predictions than single task models. We then present a systematic framework for comparing how differences in model performance can be attributed to differences in what the model has learned. To understand what additional information bpAI-TAC gleans from ATAC-profiles, we use sequence attributions and identify motifs that have different effect sizes when trained on profiles. We conclude that modeling ATAC-seq at base-pair resolution enables the model to learn a more sensitive representation of the regulatory syntax that drives differences between immunocytes, and therefore will improve predictions of variant effects.

TwitterandLinkedIn

0 comments

Add comment

Refining the cis-regulatory grammar learned by sequence-to-activity models by increasing model resolution

Refining the cis-regulatory grammar learned by sequence-to-activity models by increasing model resolution

AI-powered Paper ChatBeta

AI-powered Paper ChatBeta

0 comments