Bridging Worlds: Connecting Glycan Representations with Glycoinformatics via Universal Input and a Canonicalized Nomenclature

Avatar
Poster
Voice is AI-generated
Connected to paperThis paper is a preprint and has not been certified by peer review

Bridging Worlds: Connecting Glycan Representations with Glycoinformatics via Universal Input and a Canonicalized Nomenclature

Authors

Urban, J.; Joeres, R.; Bojar, D.

Abstract

As the field of glycobiology has developed, so too have different glycan nomenclature systems, reflecting diverse cognitive and practical needs of different scientific uses. While each system serves specific purposes, this multiplicity creates challenges for usability, data integration, and knowledge sharing. Here, we present a practical framework for automated nomenclature conversion, taking any nomenclature as input, without having to declare the specific language, and using a canonicalized IUPAC-condensed format as a standardized output representation. Our implementation handles (i) all common nomenclatures, including common typos, (ii) complex cases including structural ambiguities, modifications, and uncertainty in linkage information, and (iii) different compositional representations. This Universal Input framework can translate more than 10 nomenclatures in less than 1 ms, tested on over 50,000 sequences with 95-100% coverage, enabling seamless integration of existing glycan databases and tools while maintaining the specific advantages of each representation system.

Follow Us on

0 comments

Add comment