Bridging Worlds: Connecting Glycan Representations with Glycoinformatics via Universal Input and a Canonicalized Nomenclature
Bridging Worlds: Connecting Glycan Representations with Glycoinformatics via Universal Input and a Canonicalized Nomenclature
Urban, J.; Joeres, R.; Bojar, D.
AbstractAs the field of glycobiology has developed, so too have different glycan nomenclature systems, reflecting diverse cognitive and practical needs of different scientific uses. While each system serves specific purposes, this multiplicity creates challenges for usability, data integration, and knowledge sharing. Here, we present a practical framework for automated nomenclature conversion, taking any nomenclature as input, without having to declare the specific language, and using a canonicalized IUPAC-condensed format as a standardized output representation. Our implementation handles (i) all common nomenclatures, including common typos, (ii) complex cases including structural ambiguities, modifications, and uncertainty in linkage information, and (iii) different compositional representations. This Universal Input framework can translate more than 10 nomenclatures in less than 1 ms, tested on over 50,000 sequences with 95-100% coverage, enabling seamless integration of existing glycan databases and tools while maintaining the specific advantages of each representation system.