UCSC-SOE-12-12: Bayesian Semiparametric Regression Models to Characterize Molecular Evolution

Saheli Datta, Abel Rodriguez, Raquel Prado
08/13/2012 06:18 PM
Applied Mathematics & Statistics
Background: Statistical models and methods that associate changes in the physicochemical properties of amino acids with natural selection at the molecular level typically do not take into account the
correlations between such properties. We propose a Bayesian hierarchical regression model with a generalization of the Dirichlet process prior on the distribution of the regression coefficients that describes the relationship between the changes in amino
acid distances and natural selection in protein-coding DNA sequence alignments.

Results: The Bayesian semiparametric approach is illustrated with simulated data and the abalone lysin sperm data. Our method identifies groups of properties which, for this particular dataset, have a similar effect on evolution. The model also provides nonparametric site-specific estimates for the strength of conservation of these properties.

Conclusions: The model described here is distinguished by its ability to handle a large number of amino acid properties simultaneously, while taking into account that such data can be correlated. The multi-level clustering ability of the model allows for appealing interpretations of the results in terms of properties that are roughly equivalent from the standpoint of molecular evolution.

UCSC-SOE-12-12