A molecular representation system with a common reference frame for natural products pathway discovery and structural diversity tasks.

Avatar
Poster
Voice is AI-generated
Connected to paperThis paper is a preprint and has not been certified by peer review

A molecular representation system with a common reference frame for natural products pathway discovery and structural diversity tasks.

Authors

Babineau, N.; Nguyen, L. T. D.; Mathieu, D.; McCue, C.; Schlecht, N.; Abrahamson, T.; Hamberger, B.; Busta, L.

Abstract

Researchers have uncovered hundreds of thousands of natural products, many of which contribute to medicine, materials, and agriculture. However, missing knowledge of the biosynthetic pathways to these products hinders their expanded use. Nucleotide sequencing is key in pathway elucidation efforts, and analyses of natural products\' molecular structures, though seldom discussed explicitly, also play an important role by suggesting hypothetical pathways for testing. Structural analyses are also important in drug discovery, where many molecular representation systems - methods of representing molecular structures in a computer-friendly format - have been developed. Unfortunately, pathway elucidation investigations seldom use these representation systems. This gap is likely because those systems are primarily built to document molecular connectivity and topology, rather than the absolute positions of bonds and atoms in a common reference frame, the latter of which enables chemical structures to be connected with potential underlying biosynthetic steps. Here, we present a unique molecular representation system built around a common reference frame. We tested this system using triterpenoid structures as a case study and explored the system\'s applications in biosynthesis and structural diversity tasks. The common reference frame system can identify structural regions of high or low variability on the scale of atoms and bonds and enable hierarchical clustering that is closely connected to underlying biosynthesis. Combined with phylogenetic distribution information, the system illuminates distinct sources of structural variability, such as different enzyme families operating in the same pathway. These characteristics outline the potential of common reference frame molecular representation systems to support large-scale pathway elucidation efforts.

Follow Us on

0 comments

Add comment