COSC 348: Lab06
Hidden Markov Models (HMMs)
Work through the section on
Scoring a Sequence with an HMM up to the forward algorithm
in the online tutorial by Rachel Karchin from the department of Bioinformatics
and Computational Biology, University of California, Santa Cruz, USA:
Hidden
Markov Models and Protein Sequence Analysis.
(Rachel has given permission for us to use her tutorial material.)
IMPORTANT
Save the last 30 min or so for writing (and submitting) a 1 page report in which you answer the questions below.
Remember, your today's lab work and report are worth 1% of your final mark.
In your report, try to answer these questions (time permitting):
Given an HMM with 3 match states, 3 delete states and 4 insertion states
(in Figure 8), what is the minimum number of states that a "random walk" will visit?
What is the maximum? (do not consider the BEGIN and END states)
If a "random walk" generates a sequence of L aminoacids, and we define N
as the number of states visited in that path, is it true that L is always
less or equal than N? Why?
In a pairwise alignment, a higher penalty is generally imposed for a gap
"opening" compared with the gap "extension". Is this feature present in an
HMM profile? Why?
Why are HMM best suited to find distant homologs than the profiles based
on position-specific probability distributions?
An HMM is very sensitive to the quality of the training set. Discuss
briefly how the following limitations affect the model built and give a
possible solution.
1) Training sequences are not evenly distributed in the sequence space of
their class
2) There are too many sequences in the training set and they cover a huge
space
3) There are only few sequences available for the training set
Cosc348 home
Cosc348 labs