Whole-genome prediction of bacterial pathogenic capacity on novel bacteria using protein language models, with PathogenFinder2

Avatar
Poster
Voice is AI-generated
Connected to paperThis paper is a preprint and has not been certified by peer review

Whole-genome prediction of bacterial pathogenic capacity on novel bacteria using protein language models, with PathogenFinder2

Authors

Ferrer Florensa, A.; Almagro Armenteros, J. J.; Kaas, R. S.; Clausen, P. T.; Nielsen, H.; Rost, B.; Aarestrup, F. M.

Abstract

Infectious diseases continue to be a leading cause of mortality and pose a significant global health threat. Thus the development of tools for surveillance and early detection of emerging pathogens is needed. In this study, we introduce PathogenFinder2, a novel predictor of bacterial pathogenic capacity in humans, available through an online server (http://genepi.food.dtu.dk/pathogenfinder2), or as a standalone program (https://github.com/genomicepidemiology/PathogenFinder2). The model, using protein language models for whole-genome phenotype prediction, surpasses the performance of previous methods, especially for novel bacterial taxa, while being taxonomy-agnostic and alignment-free. At the same time, it predicts the importance of each protein for the pathogenic capacity. This output might aid in characterizing potential pathogens, it readily identifies new candidates for virulence factors and vaccine targets and offers insights into infection metabolic pathways. Furthermore, we introduce the Bacterial Pathogenic Landscape, revealing distributions related to the host conditions, antagonist bacteria, infection site, or habitat.

Follow Us on

0 comments

Add comment