UCSC-CRL-92-23: PROTEIN MODELING USING HIDDEN MARKOV MODELS: ANALYSIS OF GLOBINS

06/01/1992 09:00 AM
Biomolecular Engineering
We apply Hidden Markov Models (HMMs) to the problem of statistical modeling and multiple sequence alignment of protein families. A variant of the Expectation Maximization (EM) algorithm known as the Viterbi algorithm is used to obtain the statistical model from the unaligned sequences. In a detailed series of experiments, we have taken 400 unaligned globin sequences, and produced a statistical model entirely automatically from the primary (unaligned) sequences. We use no prior knowledge of globin structure. Using this model, we obtained a multiple alignment of the 400 sequences and 225 other globin sequences that agrees almost perfectly with a structural alignment by Bashford et al. This model can also discriminate all these 625 globins from nonglobin protein sequences with greater than 99% accuracy, and can thus be used for database searches.

UCSC-CRL-92-23