Domain classification of archaeal proteomes reveals conserved fold repertoire

Avatar
Poster
Voice is AI-generated
Connected to paperThis paper is a preprint and has not been certified by peer review

Domain classification of archaeal proteomes reveals conserved fold repertoire

Authors

Schaeffer, R. D.; Pei, J.; Guo, R.; Zhang, J.; Medvedev, K.; Cong, Q.; Grishin, N.

Abstract

Archaea represent one of the three domains of cellular life and yet account for fewer than 1% of experimentally determined protein structures, leaving the extent of their structural novelty unknown. Here we present a systematic domain-level classification of 124,075 proteins from 65 archaeal classes spanning 21 phyla and all major lineages, using both AFDB and newly predicted AlphaFold3 structures classified against the Evolutionary Classification of protein Domains (ECOD). We assigned 204,758 domains, of which 76.8% received high-confidence classifications, spanning 987 ECOD X-groups; 40% of known structural diversity within a single domain of life. Clustering by Foldseek recovered structural relationships for 63% of domains that are singletons by sequence comparison. To characterize the 21% of proteins lacking high-confidence classification, we applied successive filters for structure prediction confidence, protein length, and structural cluster context, reducing 8,452 domain-free proteins to a small number of well-folded structural orphans (less than 0.1% of the dataset). The unclassified fraction is dominated by sub-threshold matches to known folds (14% of all proteins) and low-confidence structure predictions (5%), not by novel structures. These results demonstrate that the protein fold repertoire at the single-domain level is broadly conserved across the deepest phylogenetic distances in cellular life, and that the gap between archaeal and well-characterized proteomes reflects classification sensitivity for divergent sequences rather than unexplored structural diversity.

Follow Us on

0 comments

Add comment