IMPORTANT: submit the code and its output, i.e. 20 generated and scored sequences. Worth 1%.
If you do not finish the coding during the lab, you can continue at home and submit later by the end of the week.
Write a code that will generate and score 20 random DNA sequences based on the HMM provided below.
A random sequence will correspond to a random walk starting on the "Start" state and finishing
on the "End" state.
Di, Ii, and Mi are the deletion, insertion and
match states respectively. The transitions (arrows) from a state q to a state r will have a
probability T(r|q) which can be found in the Figure. These transition probabilities
will favour some paths more than others.
The following Table lists the probability distributions P(x|q) of the DNA bases in
state q.
| A | C | G | T | |
| M1 | 0.10 | 0.80 | 0.04 | 0.06 |
| M2 | 0.40 | 0.55 | 0.04 | 0.01 |
| M3 | 0.73 | 0.18 | 0.02 | 0.07 |
| I1 | 0.15 | 0.29 | 0.33 | 0.23 |
| I2 | 0.44 | 0.51 | 0.02 | 0.03 |
| I3 | 0.70 | 0.08 | 0.19 | 0.03 |
| I4 | 0.01 | 0.01 | 0.08 | 0.90 |
Note: You may want to use logarithms for scoring your random sequences.
/* The C-style pseudo code for state transitions */
double unifRand() {
return rand() / double(RAND_MAX);
}
int main(){
seed(); // new random seed
state[0] = Start;
rnd = unifRand(); // generate a random number between 0 and 1 according to uniform distribution
if (0.0 < rnd <= 0.3) state[1] = D1;
if (0.3 < rnd <= 0.7) state[1] = I1;
if (0.7 < rnd <= 1.0) state[1] = M1;
until (state[t] = End){
rnd = unifRand();
// procedure for transition to a new state comes here
if(state[t] = M1){
rnd = unifRand();
// procedure to generate output token using M probabilities comes here
}
if(state[t] = I1){
rnd = unifRand();
// procedure to generate output token using I probabilities comes here
}
if(state[t] = D1){
return "-"; // return gap
}
t = t + 1; // move to the next position in the sequence
}
return(0);
}
/* The C-style pseudo code for generating output tokens */
int sequence(){
{
while (state[t] = M1){
rnd = unifRand();
if (0.00 < rnd <= 0.10) letter[t] = A;
if (0.10 < rnd <= 0.90) letter[t] = C;
if (0.90 < rnd <= 0.94) letter[t] = G;
if (0.94 < rnd <= 1.00) letter[t] = T;
}
return(0);
}