The genomes of the Macadamia genus

By: Sharma, P.; Masouleh, A.; Constantin, L.; Topp, B.; Furtado, A.; Henry, R. J.

Macadamia, a genus native to Eastern Australia, comprises four species, Macadamia integrifolia, M. tetraphylla, M. ternifolia, and M. jansenii. Macadamia was recently domesticated largely from a limited gene pool of Hawaiian germplasm and has become a commercially significant nut crop. Disease susceptibility and climate adaptability challenges, highlight the need for use of a wider range of genetic resources for macadamia production. High qua... more
Macadamia, a genus native to Eastern Australia, comprises four species, Macadamia integrifolia, M. tetraphylla, M. ternifolia, and M. jansenii. Macadamia was recently domesticated largely from a limited gene pool of Hawaiian germplasm and has become a commercially significant nut crop. Disease susceptibility and climate adaptability challenges, highlight the need for use of a wider range of genetic resources for macadamia production. High quality haploid resolved genome assemblies were generated using HiFiasm to allow comparison of the genomes of the four species. Assembly sizes ranged from 735 Mb to 795 Mb and N50 from 53.7 Mb to 56 Mb, indicating high assembly continuity with most of the chromosomes covered telomere to telomere. Repeat analysis revealed that approximately 61% of the genomes were repetitive sequence. The BUSCO completeness scores ranged from 95.0% to 98.9%, confirming good coverage of the genomes. Gene prediction identified 37198 to 40534 genes. The ks distribution plot of Macadamia and Telopea suggests Macadamia has undergone a whole genome duplication event prior to divergence of the four species and that Telopea genome was duplicated more recently. Synteny analysis revealed a high conservation and similarity of the genome structure in all four species. Differences in the content of genes of fatty acid and cyanogenic glycoside biosynthesis were found between the species. An antimicrobial gene with a conserved cysteine motif was found in all four species. The four genomes provide reference genomes for exploring genetic variation across the genus in wild and domesticated germplasm to support plant breeding. less
Impact of N-terminal mutated TcUBP1 knockdown on the transcriptome profiling of epimastigote cells: An RNA-Seq study in Trypanosoma cruzi

By: Sabalette, K. B.; Campo, V. A.; Sotelo-Silveira, J. R.; Smircich, P.; De Gaudenzi, J. G.

During its life cycle, the human pathogen Trypanosoma cruzi must quickly adapt to different environments, in which the variation in the gene expression of the regulatory U-rich RNA-binding protein 1 (TcUBP1) plays a crucial role. We have previously demonstrated that the overexpression of TcUBP1 in insect-dwelling epimastigotes orchestrates an RNA regulon to promote differentiation to infective forms. In an attempt to generate TcUBP1 knockout ... more
During its life cycle, the human pathogen Trypanosoma cruzi must quickly adapt to different environments, in which the variation in the gene expression of the regulatory U-rich RNA-binding protein 1 (TcUBP1) plays a crucial role. We have previously demonstrated that the overexpression of TcUBP1 in insect-dwelling epimastigotes orchestrates an RNA regulon to promote differentiation to infective forms. In an attempt to generate TcUBP1 knockout parasites by using CRISPR-Cas9 technology, in the present study, we obtained a variant transcript that encodes a protein with 95% overall identity and a modified N-terminal sequence. The expression of this mutant protein, named TcUBP1mut, was notably reduced compared to that of the endogenous form found in normal cells. TcUBP1mut-knockdown epimastigotes exhibited normal growth and differentiation into infective metacyclic trypomastigotes and were capable of infecting mammalian cells. We analyzed the RNA-Seq expression profiles of these parasites and identified 276 up- and 426 downregulated genes with respect to the wildtype control sample. RNA-Seq comparison across distinct developmental stages revealed that the transcriptomic profile of these TcUBP1mut-knockdown epimastigotes significantly differs not only from that of epimastigotes in the stationary phase but also from the gene expression landscape characteristic of infective forms. This is both contrary to and consistent with the results of our recent study involving TcUBP1-overexpressing cells. Together, our findings demonstrate that the genes exhibiting opposite changes under overexpression and knockdown conditions unveil key mRNA targets regulated by TcUBP1. These mostly encompass transcripts that encode for trypomastigote-specific surface glycoproteins and ribosomal proteins, supporting a role for TcUBP1 in determining the molecular characteristics of the infective stage. less
GeneMAP: A discovery platform for metabolic gene function

By: Birsoy, K.; Gamazon, E.; Khan, A.; Unlu, G.; Lin, P.; Liu, Y.; Kilic, E.; Kenny, T. C.

Organisms maintain metabolic homeostasis through the combined functions of small molecule transporters and enzymes. While many of the metabolic components have been well-established, a substantial number remains without identified physiological substrates. To bridge this gap, we have leveraged large-scale plasma metabolome genome-wide association studies (GWAS) to develop a multiomic Gene-Metabolite Associations Prediction (GeneMAP) discovery... more
Organisms maintain metabolic homeostasis through the combined functions of small molecule transporters and enzymes. While many of the metabolic components have been well-established, a substantial number remains without identified physiological substrates. To bridge this gap, we have leveraged large-scale plasma metabolome genome-wide association studies (GWAS) to develop a multiomic Gene-Metabolite Associations Prediction (GeneMAP) discovery platform. GeneMAP can generate accurate predictions, even pinpointing genes that are distant from the variants implicated by GWAS. In particular, our work identified SLC25A48 as a genetic determinant of plasma choline levels. Mechanistically, SLC25A48 loss strongly impairs mitochondrial choline import and synthesis of its downstream metabolite, betaine. Rare variant testing and polygenic risk score analyses have elucidated choline-relevant phenomic consequences of SLC25A48 dysfunction. Altogether, our study proposes SLC25A48 as a mitochondrial choline transporter and provides a discovery platform for metabolic gene function. less
Newly obtained genome of fungi-related amoeba is enriched with genes shared with animals-related protists

By: Pozdnyakov, I.; Potapenko, E. V.; Kalashnikova, V. M.; Barzasekova, C. O.; Zlatogursky, V. V.; Sukhanova, K. M.; Babenko, V. V.; Boldyreva, D. I.

Nuclearariids are a group of Opisthokonta, forming the deepest branch in Holomycota - one of the two major Opisthokonta clades, containing Fungi as a crawn group. They are the only members of Holomycota retaining the filose amoeboid state ancestral for Opisthokonta. The newly assembled genome of Nuclearia thermophila (Holomycota, Rotosphaerida) had a total length of 49 Mb, 15 321 protein-coding genes and a GC percentage of 44%. This is the fi... more
Nuclearariids are a group of Opisthokonta, forming the deepest branch in Holomycota - one of the two major Opisthokonta clades, containing Fungi as a crawn group. They are the only members of Holomycota retaining the filose amoeboid state ancestral for Opisthokonta. The newly assembled genome of Nuclearia thermophila (Holomycota, Rotosphaerida) had a total length of 49 Mb, 15 321 protein-coding genes and a GC percentage of 44%. This is the first sequenced genome for this genus and the the third for Rotosphaerida as a whole. It was shown that N. thermophila shares more protein domains with Holozoa, than with the rest of Holomycota. Protein domains that were presumably acquired or lost by the common ancestors of the Holomycota and Holozoa groups were identified. The Holomycota ancestor had probably more gains and losses of protein domains compared to the Holozoa ancestor, which is particularly true for metabolism-related domains. However, this trend should be confirmed by studying the genomes of free-living organisms of the Teretosporea group. less
The mutational landscape of Staphylococcus aureus during colonisation

By: Coll, F.; Blane, B.; Bellis, K.; Matuszewska, M.; Jamrozy, D.; Toleman, M.; Geoghegan, J. A.; Parkhill, J.; Massey, R. C.; Peacock, S. J.; Harrison, E. M.

Staphylococcus aureus is an important human pathogen but is primarily a commensal of the human nose and skin. Survival during colonisation is likely one of the major drivers of S. aureus evolution. Here we use a genome-wide mutation enrichment approach to analyse a genomic dataset of 3,060 S. aureus isolates from 791 individuals to show that despite limited within-host genetic diversity, an excess of protein-altering mutations can be found in... more
Staphylococcus aureus is an important human pathogen but is primarily a commensal of the human nose and skin. Survival during colonisation is likely one of the major drivers of S. aureus evolution. Here we use a genome-wide mutation enrichment approach to analyse a genomic dataset of 3,060 S. aureus isolates from 791 individuals to show that despite limited within-host genetic diversity, an excess of protein-altering mutations can be found in genes encoding key metabolic pathways, in regulators of quorum-sensing and in known antibiotic targets. Nitrogen metabolism and riboflavin synthesis are the metabolic processes with strongest evidence of adaptation. Further evidence of adaptation to nitrogen availability was revealed by enrichment of mutations in the assimilatory nitrite reductase and urease, including mutations that enhance growth with urea as the sole nitrogen source. Inclusion of an additional 4,090 genomes from 802 individuals revealed eight additional genes including sasA/sraP, pstA, and rsbU with signals adaptive variation that warrant further characterisation. Our study provides the most comprehensive picture to date of the heterogeneity of adaptive changes that occur in the genomes of S. aureus during colonisation, revealing the likely importance of nitrogen metabolism, loss of quorum sensing and antibiotic resistance for successful human colonisation. less
Atlas of nascent RNA transcripts reveals enhancer to gene linkages

By: Sigauke, R. F.; Sanford, L.; Maas, Z. L.; Jones, T.; Stanley, J. T.; Townsend, H. A.; Allen, M. A.; Dowell, R. D.

Gene transcription is controlled and modulated by regulatory regions, including enhancers and promoters. These regions are abundant in unstable, non-coding bidirectional transcription. Using nascent RNA transcription data across hundreds of human samples, we identified over 800,000 regions containing bidirectional transcription. We then identify highly correlated transcription between bidirectional and gene regions. The identified correlated ... more
Gene transcription is controlled and modulated by regulatory regions, including enhancers and promoters. These regions are abundant in unstable, non-coding bidirectional transcription. Using nascent RNA transcription data across hundreds of human samples, we identified over 800,000 regions containing bidirectional transcription. We then identify highly correlated transcription between bidirectional and gene regions. The identified correlated pairs, a bidirectional region and a gene, are enriched for disease associated SNPs and often supported by independent 3D data. We present these resources as an SQL database which serves as a resource for future studies into gene regulation, enhancer associated RNAs, and transcription factors. less
MLL3/MLL4 enzymatic activity shapes DNA replication timing

By: Goekbuget, D.; Boileau, R. M.; Lenshoek, K.; Blelloch, R.

Mammalian genomes are replicated in a precise order during S phase, which is cell-type-specific and correlates with local transcriptional activity, chromatin modifications and chromatin architecture. However, the causal relationships between these features and the key regulators of DNA replication timing (RT) are largely unknown. Here, machine learning was applied to quantify chromatin features, including epigenetic marks, histone variants an... more
Mammalian genomes are replicated in a precise order during S phase, which is cell-type-specific and correlates with local transcriptional activity, chromatin modifications and chromatin architecture. However, the causal relationships between these features and the key regulators of DNA replication timing (RT) are largely unknown. Here, machine learning was applied to quantify chromatin features, including epigenetic marks, histone variants and chromatin architectural factors, best predicting local RT under steady-state and RT changes during early embryonic stem (ES) cell differentiation. About one-third of genome exhibited RT changes during the differentiation. Combined, chromatin features predicted steady-state RT and RT changes with high accuracy. Of these features, histone H3 lysine 4 monomethylation (H3K4me1) catalyzed by MLL3/4 (also known as KMT2C/D) emerged as a top predictor. Loss of Mll3/4 (but not Mll3 alone) or their enzymatic activity resulted in erasure of genome-wide RT dynamics during ES cell differentiation. Sites that normally gain H3K4me1 in a MLL3/4-dependent fashion during the transition failed to transition towards earlier RT, often with transcriptional activation unaffected. Further analysis revealed a requirement for MLL3/4 in promoting DNA replication initiation zones through MCM2 recruitment, providing a direct link for its role in regulating RT. Our results uncover MLL3/4-dependent H3K4me1 as a functional regulator of RT and highlight a causal relationship between the epigenome and RT that is largely uncoupled from transcription. These findings uncover a previously unknown role for MLL3/4-dependent chromatin functions which is likely relevant to the numerous diseases associated with MLL3/4 mutations. less
Cryptic endogenous retrovirus subfamilies in the primate lineage

By: Chen, X.; Zhang, Z.; Yan, Y.; Goubert, C.; Bourque, G.; Inoue, F.

Many endogenous retroviruses (ERVs) in the human genome are primate-specific and have contributed novel cis-regulatory elements and transcripts. However, current approaches for classifying and annotating ERVs and their long terminal repeats (LTRs) have limited resolution and are inaccurate. Here, we developed a new annotation based on phylogenetic analysis and cross-species conservation. Focusing on the evolutionary young MER11A/B/C subfamili... more
Many endogenous retroviruses (ERVs) in the human genome are primate-specific and have contributed novel cis-regulatory elements and transcripts. However, current approaches for classifying and annotating ERVs and their long terminal repeats (LTRs) have limited resolution and are inaccurate. Here, we developed a new annotation based on phylogenetic analysis and cross-species conservation. Focusing on the evolutionary young MER11A/B/C subfamilies, we revealed the presence of four phyletic groups, that better explained the epigenetic heterogeneity observed within these subfamilies, suggesting a new annotation for 412 (19.8%) of the MER11 instances. Furthermore, we functionally validated the regulatory potential of these four phyletic groups using a massively parallel reporter assay (MPRA), which also identified motifs associated with their differential activities. Combining MPRA with phyletic groups across primates revealed an apes-specific gain of SOX related motifs through a single-nucleotide deletion. Lastly, by applying our approach across 53 primate-specific LTR subfamilies, we determined the presence of 75 phyletic groups and found that 3,807 (30.0%) instances from 26 LTR subfamilies could be categorized into a novel phyletic group, many of which with a distinct epigenetic profile. Thus, with our refined annotation of primate-specific LTRs, it will be possible to better understand the evolution in primate genomes and potentially identify new roles for ERV/LTRs in their hosts. less
A Small Genome Amidst the Giants: Evidence of Genome Reduction in a Small Tubulinid Free-Living Amoeba

By: Tekle, Y.; Tefera, H.

This study investigates the genomic characteristics of Echinamoeba silvestris, a small-sized amoeba within the Tubulinea clade of the Amoebozoa supergroup. Despite Tubulinea\'s significance in various fields, genomic data for this clade have been scarce. E. silvestris presents the smallest free-living amoeba genome within Tubulinea and Amoebozoa to date. Comparative analysis reveals intriguing parallels with parasitic lineages in terms of gen... more
This study investigates the genomic characteristics of Echinamoeba silvestris, a small-sized amoeba within the Tubulinea clade of the Amoebozoa supergroup. Despite Tubulinea\'s significance in various fields, genomic data for this clade have been scarce. E. silvestris presents the smallest free-living amoeba genome within Tubulinea and Amoebozoa to date. Comparative analysis reveals intriguing parallels with parasitic lineages in terms of genome size and predicted gene numbers, emphasizing the need to understand the consequences of reduced genomes in free-living amoebae. Functional categorization of predicted genes in E. silvestris shows similar percentages of ortholog groups to other amoebae in various categories, but a distinctive feature is the extensive gene contraction in orphan (ORFan) genes and those involved in biological processes. Notably, among the few genes that underwent expansion, none are related to cellular components, suggesting adaptive processes that streamline biological processes and cellular components for efficiency and energy conservation. The investigation delves into genomic structural evidence, including gene content and repetitive elements, illuminating the distinctive genomic traits of E. silvestris and providing reinforcement for its compact genome size. Overall, this research underscores the diversity within Tubulinea, highlights knowledge gaps in Amoebozoa genomics, and positions E. silvestris as a valuable addition to genomic datasets, prompting further exploration of complexities in Amoebozoa diversity and genome evolution. less
An improved haplotype resolved genome reveals more rice genes

By: Abdullah, M.; Furtado, A.; Masouleh, A.; Okemo, P.; Henry, R. J.

The rice reference genome (Oryza sativa ssp. japonica cv. Nipponbare) has been an important resource in plant science. We now report an improved and haplotype resolved genome sequence based upon more accurate sequencing technology. This improved assembly includes regions missing in earlier genomes sequences and the annotation of more than 3,000 new genes due to greater sequence accuracy.
The rice reference genome (Oryza sativa ssp. japonica cv. Nipponbare) has been an important resource in plant science. We now report an improved and haplotype resolved genome sequence based upon more accurate sequencing technology. This improved assembly includes regions missing in earlier genomes sequences and the annotation of more than 3,000 new genes due to greater sequence accuracy. less