SeroBA(v2.0) and SeroBAnk: a robust genome-based serotyping scheme and comprehensive atlas of capsular diversity in Streptococcus pneumoniae
SeroBA(v2.0) and SeroBAnk: a robust genome-based serotyping scheme and comprehensive atlas of capsular diversity in Streptococcus pneumoniae
Lorenz, O.; King, A. C.; Hung, H. C. H.; Ganaie, F. A.; Wyllie, A. L.; Manna, S.; Satzke, C.; van der Linden, M.; Ravenscroft, N.; Slotved, H.-C.; McGee, L.; Nahm, M. H.; Bentley, S. D.; Lo, S. W.
AbstractThe unprecedented number of Streptococcus pneumoniae (the pneumococcus) genomes sequenced in recent years has accelerated the discovery of novel serotypes and highlighted the genetic diversity both between and within each serotype. A novel serotype should demonstrate a distinct cps locus, capsular structure, and serological profile. In only the past four years, nine new serotypes have been identified. Accurate and timely serotyping of pneumococcal isolates is key to understanding its global distribution, evolution, and the response of the bacterial population to vaccination. However, current bioinformatics serotyping tools are infrequently updated, and struggle to accommodate the rapid discovery of new serotypes in a timely manner. To address these limitations, we built a comprehensive and curated library (SeroBAnk) encompassing all known pneumococcal serotypes; this resource is presented as an atlas on a dedicated publicly accessible webpage (https://www.pneumogen.net/gps/#/serobank). Building upon this resource, we developed SeroBA(v2.0), a tool with an easy-to-update database that can accurately identify 102 of 107 known pneumococcal serotypes (except for serotypes 24B, 24C, 24F, 7D and 6H) and 18 genetic subtypes within serotypes 6A, 6B, 11A, 19A, 19F and 33F. We validated SeroBA(v2.0) on 26,306 genomes from the Global Pneumococcal Sequencing project, reference isolates and simulated reads derived from the reference genetic sequences of capsular polysaccharide biosynthetic (cps) locus and showed that SeroBA(v2.0) can reliably detect the nine recently discovered serotypes. Additionally, we show that in silico serotypes inferred by SeroBA(v2.0) had high concordance with phenotypic serotypes determined by either Quellung or latex agglutination at the serotype level (88.9%; 15,945/17,933), and at the serogroup level (91.9%; 16,480/17,933). Finally, we propose a community-contribution based approach to ensure that SeroBA(v2.0) is maintained and updated as novel serotypes continue to be discovered. The global community can submit putative novel serotypes through our public repository on GitHub (https://github.com/GlobalPneumoSeq/seroba/issues). The submitted putative novel serotypes will be curated based on the genetic sequence of cps region, capsular structure and serological profile by people of relevant expertise in the field. SeroBA(v2.0) can be accessed at https://github.com/GlobalPneumoSeq/seroba.