pyRBDome: A comprehensive computational platform for enhancing and interpreting RNA-binding proteome data

By: Chu, L.-C.; Christopoulou, N.; McCaughan, H.; Winterbourne, S.; Cazzola, D.; Wang, S.; Litvin, U.; Brunon, S.; Harker, P. J. B.; McNae, I.; Granneman, S.

High-throughput proteomics approaches have revolutionised the identification of RNA-binding proteins (RBPome) and RNA-binding sequences (RBDome) across organisms. Many novel putative RNA-binding proteins (RBPs) were discovered, including those that lack recognisable RNA-binding domains. Yet the extent of noise, including false-positive identifications, associated with these methodologies is difficult to quantify as experimental approaches for... more
High-throughput proteomics approaches have revolutionised the identification of RNA-binding proteins (RBPome) and RNA-binding sequences (RBDome) across organisms. Many novel putative RNA-binding proteins (RBPs) were discovered, including those that lack recognisable RNA-binding domains. Yet the extent of noise, including false-positive identifications, associated with these methodologies is difficult to quantify as experimental approaches for validating the results are generally low throughput. To address this, we introduce pyRBDome, a pipeline for in-depth in silico enhancement of RNA-binding proteome data. It does so by comparing experimental results with RNA-binding site (RBS) predictions from several distinct machine learning tools and integrates high-resolution structural data of protein-RNA complexes when available. By providing a statistical evaluation of RBDome data, users can rapidly identify protein sequences from RBDome experiments most likely to be bona fide RNA-binders. Furthermore, by leveraging the predictions collated by pyRBDome, we have enhanced the sensitivity and specificity of RBS detection through training new ensemble machine learning models. We describe a pyRBDome analysis of a large human RBDome dataset and conducted a comparision with know structural data. These analyses reinforced the significance of stacking interactions in UV cross-linking protein-RNA interactions. Surprisingly, our analyses revealed two contrasting findings: While UV cross-linked amino acids were more likely to contain predicted RBSs, they infrequently bind RNA in high-resolution structures. Given the known limitations of structural data as benchmarks, these finding highlights the utility of pyRBDome as a valuable alternative approach for enhancing confidence in RBDome datasets. Finally, our comprehensive analysis of hundreds of (putative) RBPs offers a valuable resource for RBP enthusiasts. less
Integrating Multiplexed Imaging and Multiscale Modeling Identifies Tumor Phenotype Transformation as a Critical Component of Therapeutic T Cell Efficacy

By: Hickey, J. W.; Agmon, E.; Horowitz, N.; Lamore, M.; Sunwoo, J. B.; Covert, M.; Nolan, G. P.

Cancer progression is a complex process involving interactions that unfold across molecular, cellular, and tissue scales. These multiscale interactions have been difficult to measure and to simulate. Here we integrated CODEX multiplexed tissue imaging with multiscale modeling software, to model key action points that influence the outcome of T cell therapies with cancer. The initial phenotype of therapeutic T cells influences the ability of T... more
Cancer progression is a complex process involving interactions that unfold across molecular, cellular, and tissue scales. These multiscale interactions have been difficult to measure and to simulate. Here we integrated CODEX multiplexed tissue imaging with multiscale modeling software, to model key action points that influence the outcome of T cell therapies with cancer. The initial phenotype of therapeutic T cells influences the ability of T cells to convert tumor cells to an inflammatory, anti-proliferative phenotype. This T cell phenotype could be preserved by structural reprogramming to facilitate continual tumor phenotype conversion and killing. One takeaway is that controlling the rate of cancer phenotype conversion is critical for control of tumor growth. The results suggest new design criteria and patient selection metrics for T cell therapies, call for a rethinking of T cell therapeutic implementation, and provide a foundation for synergistically integrating multiplexed imaging data with multiscale modeling of the cancer-immune interface. less
A bovine pulmosphere model and multiomics analyses identify a signature of early host response to Mycobacterium tuberculosis infection

By: Bhaskar, V.; Kumar, R.; Praharaj, M. R.; Gandham, S.; Maity, H. K.; Sarkar, U.; Dey, B.

Interactions between the tubercle bacilli and lung cells during the early stages of tuberculosis (TB) are crucial for disease outcomes. Conventional 2D cell culture inadequately replicates the multicellular complexity of lungs. We introduce a 3D pulmosphere model for Mycobacterium tuberculosis infection in bovine systems, demonstrating through comprehensive transcriptome and proteome analyses that these 3D structures closely replicate the div... more
Interactions between the tubercle bacilli and lung cells during the early stages of tuberculosis (TB) are crucial for disease outcomes. Conventional 2D cell culture inadequately replicates the multicellular complexity of lungs. We introduce a 3D pulmosphere model for Mycobacterium tuberculosis infection in bovine systems, demonstrating through comprehensive transcriptome and proteome analyses that these 3D structures closely replicate the diverse cell populations and abundant extracellular matrix proteins, emphasizing their similarity to the in vivo pulmonary environment. While both avirulent BCG and virulent M. tuberculosis-infected pulmospheres exhibit commonalities in the upregulation of several host signaling pathways, distinct features such as upregulation of ECM receptors, neutrophil chemotaxis, interferon signaling, and RIG-1 signaling pathways characterize the unique early response to virulent M. tuberculosis. Moreover, a signature of seven genes/proteins, including IRF1, CCL5, CXCL8, CXCL10, ICAM1, COL17A1, and CFB, emerges as indicative of the early host response to M. tuberculosis infection. Overall, this study presents a superior ex vivo multicellular bovine pulmosphere TB model, with implications for discovering disease biomarkers, enabling high-throughput drug screening, and improving TB control strategies. less
Understanding Molecular Links of Vascular Cognitive Impairment: Selective Interaction between Mutant APP, TP53, and MAPKs

By: Zeylan, M. E.; Senyuz, S.; Keskin, O.; Gursoy, A.

Vascular cognitive impairment (VCI) is an understudied cerebrovascular disease. As it can result in a significant amount of functional and cognitive disabilities, it is vital to reveal proteins related to it. Our study focuses on revealing proteins related to this complex disease by deciphering the crosstalk between cardiovascular and cognitive diseases. We build protein-protein interaction networks related to cardiovascular and cognitive dis... more
Vascular cognitive impairment (VCI) is an understudied cerebrovascular disease. As it can result in a significant amount of functional and cognitive disabilities, it is vital to reveal proteins related to it. Our study focuses on revealing proteins related to this complex disease by deciphering the crosstalk between cardiovascular and cognitive diseases. We build protein-protein interaction networks related to cardiovascular and cognitive diseases. After merging these networks, we analyze the network to extract the hub proteins and their interactors. We found the clusters on this network and built the structural protein-protein interaction network of the most connected cluster on the network. We analyzed the interactions of this network with molecular modeling via PRISM. PRISM predicted several interactions that can be novel in the context of VCI-related interactions. Two mutant forms of APP (V715M and L723P), previously not connected to VCI, were discovered to interact with other proteins. Our findings demonstrate that two mutant forms of APP interact differently with TP53 and MAPK\'s. Furthermore, TP53, AKT1, PARP1, and FGFR1 interact with MAPKs through their mutant conformations. We hypothesize that these interactions might be crucial for VCI. We suggest that these interactions and proteins can act as early VCI markers or as possible therapeutic targets. less
Cell-state transitions and frequency-dependent interactions among subpopulations together explain the dynamics of spontaneous epithelial-mesenchymal heterogeneity in breast cancer

By: Jain, P.; Kizhuttil, R.; Nair, M. B.; Bhatia, S.; Thompson, E. W.; George, J. T.; Jolly, M. K.

Individual cells in a tumour can be distributed among Epithelial (E) and Mesenchymal (M) cell-states, as characterised by the levels of canonical E and M markers. Even after E and M (E-M) subpopulations are isolated and then cultured independently, E-M heterogeneity can re-equilibrate in each population over time, sometimes regaining the initial distribution of the parental cell population. However, it remains unclear which population-level p... more
Individual cells in a tumour can be distributed among Epithelial (E) and Mesenchymal (M) cell-states, as characterised by the levels of canonical E and M markers. Even after E and M (E-M) subpopulations are isolated and then cultured independently, E-M heterogeneity can re-equilibrate in each population over time, sometimes regaining the initial distribution of the parental cell population. However, it remains unclear which population-level processes give rise to the dynamical changes in E-M heterogeneity observed experimentally, including 1) differential growth, 2) cell-state switching, and 3) frequency-dependent growth or state-transition rates. Here, we analyse the necessity of these three processes in explaining the dynamics of E-M population distributions as observed in PMC42-LA and HCC38 breast cancer cells. We find that growth differences among E and M subpopulations, with and without any frequency-dependent interactions (cooperation or suppression) among E-M sub-populations, are insufficient to explain the observed population dynamics. This insufficiency is ameliorated by including cell-state transitions, albeit at slow rates, in explaining both PMC42-LA and HCC38 cells data. Further, our models predict that treatment of HCC38 cells with TGFbeta signalling and JAK2/3 inhibitors could significantly enhance the transition rates from M state to E state, but does not prevent transitions from E to M. Finally, we devise a selection criterion to identify the next most informative time points for which future experimental data can optimally improve the identifiability of our estimated best fit model parameters. Overall, our study identifies the necessary population-level processes shaping the dynamics of E-M heterogeneity in breast cancer cells. less
Predicting Phenotypic Traits Using a Massive RNA-seq Dataset

By: Hadish, J. A.; Honaas, L. A.; Ficklin, S. P.

Transcriptomic data can be used to predict environmentally impacted phenotypic traits. This type of prediction is particularly useful for monitoring difficult-to-measure phenotypic traits and has become increasingly popular for monitoring high-value agricultural crops and in precision medicine. Despite this increase in popularity, little research has been done on how many samples are required for these models to be accurate, and which normali... more
Transcriptomic data can be used to predict environmentally impacted phenotypic traits. This type of prediction is particularly useful for monitoring difficult-to-measure phenotypic traits and has become increasingly popular for monitoring high-value agricultural crops and in precision medicine. Despite this increase in popularity, little research has been done on how many samples are required for these models to be accurate, and which normalization should be used. Here we create a massive RNA-seq dataset from publicly available Arabidopsis thaliana data with corresponding measurements for age and tissue type. We use this dataset to determine how many samples are required for accurate model prediction and which normalization method is required. We find that Median Ratios Normalization significantly increases performance when predicting age. We also find that in the case of our dataset, only a few hundred samples are required to predict tissue types, and only a few thousand samples are necessary to accurately predict age. Researchers should consider these results when choosing the number of samples in a transcriptomic experiment and during data-processing. less
Signalling-state dependent drug-tolerance in head and neck squamous cell carcinoma.

By: Karjosukarso, D. W.; Dini, A.; Wingens, L. J. A.; Liu, R.; Joosten, L. A. B.; Bussink, J.; Mulder, K. W.

Intratumor heterogeneity negatively impacts therapeutic response and patient prognosis. Besides the established role of genetic heterogeneity, non-genetic mechanisms of persistence to drug treatment are emerging. Here, we characterise cells selected for their persistence to control, epidermal growth factor inhibition (EGFRi), radiation and combined treatment from low passage head and neck squamous cell carcinoma (HNSCC) cultures. Using a pane... more
Intratumor heterogeneity negatively impacts therapeutic response and patient prognosis. Besides the established role of genetic heterogeneity, non-genetic mechanisms of persistence to drug treatment are emerging. Here, we characterise cells selected for their persistence to control, epidermal growth factor inhibition (EGFRi), radiation and combined treatment from low passage head and neck squamous cell carcinoma (HNSCC) cultures. Using a panel of 70 (phospho-)specific DNA-conjugated antibodies we measured activities of 8 signalling pathways, self-renewal, differentiation, DNA damage and cell-cycle, in conjunction with the transcriptional output in single cells, using our RNA and Immuno-Detection (RAID) technology. Six recurrent transcriptional programs reflecting processes including proliferation, differentiation and metabolic activity, as well as protein-based signalling-states, were associated with drug persistence, while copy number variation inference indicated involvement of non-genetic tolerance mechanisms. Projecting RNA velocity onto the antibody-derived signalling-states suggested a key role for integrin-mediated focal-adhesion signalling in drug-persistence in our cell system. Using machine-learning we derived a core transcriptional signature connected to adhesion-based drug-persistence, which was predictive of poor prognosis in a TGCA HNSCC cohort (hazard-ratio 1.87, p<10-5). Furthermore, functional analyses confirmed that cells expressing high levels of integrin alpha-6 (ITGA6) were tolerant to EGFRi treatment, and that forcing cells out of this cell-state through transient targeted inhibition of Focal Adhesion Kinase activity re-instated EGFRi sensitivity in drug persistent cells. Taken together, our single-cell multi-omics analysis identified an actionable adhesion-signalling mediated cell-state driving drug tolerance in HNSCC. less
Bottom-up parameterization of enzyme rate constants: Reconciling inconsistent data

By: Zielinski, D.; Matos, M. R. A.; de Bree, J. E.; Glass, K.; Sonnenschein, N.; Palsson, B. O.

Kinetic models of enzymes have a long history of use for studying complex metabolic systems and designing production strains. Given the availability of enzyme kinetic data from historical experiments and machine learning estimation tools, a straightforward modeling approach is to assemble kinetic data enzyme by enzyme until a desired scale is reached. However, this type of bottom up parameterization of kinetic models has been difficult due t... more
Kinetic models of enzymes have a long history of use for studying complex metabolic systems and designing production strains. Given the availability of enzyme kinetic data from historical experiments and machine learning estimation tools, a straightforward modeling approach is to assemble kinetic data enzyme by enzyme until a desired scale is reached. However, this type of bottom up parameterization of kinetic models has been difficult due to a number of issues including gaps in kinetic parameters, the complexity of enzyme mechanisms, inconsistencies between parameters obtained from different sources, and in vitro-in vivo differences. Here, we present a computational workflow for the robust estimation of kinetic parameters for detailed mass action enzyme models while taking into account parameter uncertainty. The resulting software package, termed MASSef (the Mass Action Stoichiometry Simulation Enzyme Fitting package), can handle standard macroscopic kinetic parameters, including Km, kcat, Ki, Keq, and nh, as well as diverse reaction mechanisms defined in terms of mass action reactions and microscopic rate constants. We provide three enzyme case studies demonstrating that this approach can identify and reconcile inconsistent data either within in vitro experiments or between in vitro and in vivo enzyme function. The code and case studies are provided in the MASSef package built on top of the MASS Toolbox in Mathematica. This work builds on the legacy of knowledge on kinetic behavior of enzymes by enabling robust parameterization of enzyme kinetic models at scale utilizing the abundance of historical literature data and machine learning parameter estimates. less
Phenotypic consequences of logarithmic signaling in MAPK stress response

By: Jashnsaz, H.; Neuert, G.

How cells respond to dynamic environmental changes is crucial for understanding fundamental biological processes and cell physiology. In this study, we developed an experimental and quantitative analytical framework to explore how dynamic stress gradients that change over time regulate cellular volume, signaling activation, and growth phenotypes. Our findings reveal that gradual stress conditions substantially enhance cell growth compared to ... more
How cells respond to dynamic environmental changes is crucial for understanding fundamental biological processes and cell physiology. In this study, we developed an experimental and quantitative analytical framework to explore how dynamic stress gradients that change over time regulate cellular volume, signaling activation, and growth phenotypes. Our findings reveal that gradual stress conditions substantially enhance cell growth compared to conventional acute stress. This growth advantage correlates with a minimal reduction in cell volume dependent on the dynamic of stress. We explain the growth phenotype with our finding of a logarithmic signal transduction mechanism in the yeast Mitogen-Activated Protein Kinase (MAPK) osmotic stress response pathway. These insights into the interplay between gradual environments, cell volume change, dynamic cell signaling, and growth, advance our understanding of fundamental cellular processes in gradual stress environments. less
An evolution-based framework for describing human gut bacteria

By: Doran, B. A.; Chen, R. Y.; Giba, H.; Behera, V.; Barat, B.; Sundararajan, A.; Lin, H.; Sidebottom, A.; Pamer, E. G.; Raman, A.

The human gut microbiome contains many bacterial strains of the same species (\"strain-level variants\"). Describing strains in a biologically meaningful way rather than purely taxonomically is an important goal but challenging due to the genetic complexity of strain-level variation. Here, we measured patterns of co-evolution across >7,000 strains spanning the bacterial tree-of-life. Using these patterns as a prior for studying hundreds of gu... more
The human gut microbiome contains many bacterial strains of the same species (\"strain-level variants\"). Describing strains in a biologically meaningful way rather than purely taxonomically is an important goal but challenging due to the genetic complexity of strain-level variation. Here, we measured patterns of co-evolution across >7,000 strains spanning the bacterial tree-of-life. Using these patterns as a prior for studying hundreds of gut commensal strains that we isolated, sequenced, and metabolically profiled revealed widespread structure beneath the phylogenetic level of species. Defining strains by their co-evolutionary signatures enabled predicting their metabolic phenotypes and engineering consortia from strain genome content alone. Our findings demonstrate a biologically relevant organization to strain-level variation and motivate a new schema for describing bacterial strains based on their evolutionary history. less