An efficient deep learning method for amino acid substitution model selection

Avatar
Poster
Voices Powered byElevenlabs logo
Connected to paperThis paper is a preprint and has not been certified by peer review

An efficient deep learning method for amino acid substitution model selection

Authors

Nguyen Huy, T.; Vinh, L. S.

Abstract

Amino acid substitution models play an important role in studying the evolutionary relationships among species from protein sequences. The amino acid substitution model consists of a large number of parameters; therefore, it is estimated from hundreds or thousands of alignments. Both general models and clade specific models have been estimated and widely used in phylogenetic analyses. The maximum likelihood method is normally used to select the best fit model for a specific protein alignment under the study. A number of studies have discussed theoretical concerns as well as computational burden of the maximum likelihood methods in model selection. Recently, machine learning methods have been proposed for selecting nucleotide models. In this paper, we propose methods to create summary statistics from protein alignments to efficiently train a network of so-called ModelDetector based on the convolutional neural network ResNet-18 for detecting amino acid models. Experiments on simulation data showed that the accuracy of ModelDetector was comparable with that of the maximum likelihood method ModelFinder. The ModelDetector network was trained from 64,800 alignments on a computer with 8 cores (without GPU) in about 12 hours. It is orders of magnitudes faster than the maximum likelihood method in inferring amino acid substitution models and able to analyze genome alignments with million sites in minutes.

Follow Us on

0 comments

Add comment