Mitigating Family Effects in RNA Secondary-Structure Prediction with Latent-Space Continual Learning
Mitigating Family Effects in RNA Secondary-Structure Prediction with Latent-Space Continual Learning
Mokkedem, W.; Pedrielli, G.; Wu, T.
AbstractAccurate RNA secondary-structure prediction remains difficult despite decades of thermodynamics-based algorithms and the advent of deep-learning architectures (convolutional networks, Transformers, diffusion models). In fact, the datasets that pair RNA sequences with secondary-structure labels are often low-quality, noisy, and family-imbalanced, which limits out-of-distribution generalization and exacerbates catastrophic forgetting when new data regimes are introduced. We propose a continual-learning approach based on Lifelong Bayesian Optimization (LBO), RNAFoLBO, that treats each class of RNAs obtained from latent-space clustering as a sequential task and jointly orchestrates training and hyperparameter selection of heterogeneous models (UFold, RNA-FM, RNADiffFold), while preserving prior knowledge. Concretely, we apply LBO to 15 clusters obtained by clustering RNAStrAlign in the latent space of RNAGenesis, a model specialized in contextual representation learning and latent-space structuring, achieving a mean F1 per cluster of 0.931 (with a range of 0.177). These results surpass the strongest one-shot baseline and mitigate forgetting without full retraining. The gains persist as additional clusters are introduced. Overall, RNAFoLBO delivers higher and more stable performance and practical scalability for integrating new RNA clusters or families, enabling more robust and transferable RNA secondary-structure prediction.