IMPORTANT: submit the code and its output, i.e. 20 generated and scored sequences. Worth 1%.
If you do not finish the coding during the lab, you can continue at home and submit later by the end of the week.
Write a code that will generate and score 20 random DNA sequences based on the HMM provided below.
A random sequence will correspond to a random walk starting on the "Start" state and finishing
on the "End" state.
Di, Ii, and Mi are the deletion, insertion and
match states respectively. The transitions (arrows) from a state q to a state r will have a
probability T(r|q) which can be found in the Figure. These transition probabilities
will favour some paths more than others.
The following Table lists the probability distributions P(x|q) of the DNA bases in
state q.
A | C | G | T | |
M1 | 0.10 | 0.80 | 0.04 | 0.06 |
M2 | 0.40 | 0.55 | 0.04 | 0.01 |
M3 | 0.73 | 0.18 | 0.02 | 0.07 |
I1 | 0.15 | 0.29 | 0.33 | 0.23 |
I2 | 0.44 | 0.51 | 0.02 | 0.03 |
I3 | 0.70 | 0.08 | 0.19 | 0.03 |
I4 | 0.01 | 0.01 | 0.08 | 0.90 |
Note: You may want to use logarithms for scoring your random sequences.
/* The C-style pseudo code for state transitions */ double unifRand() { return rand() / double(RAND_MAX); } int main(){ seed(); // new random seed state[0] = Start; rnd = unifRand(); // generate a random number between 0 and 1 according to uniform distribution if (0.0 < rnd <= 0.3) state[1] = D1; if (0.3 < rnd <= 0.7) state[1] = I1; if (0.7 < rnd <= 1.0) state[1] = M1; until (state[t] = End){ rnd = unifRand(); // procedure for transition to a new state comes here if(state[t] = M1){ rnd = unifRand(); // procedure to generate output token using M probabilities comes here } if(state[t] = I1){ rnd = unifRand(); // procedure to generate output token using I probabilities comes here } if(state[t] = D1){ return "-"; // return gap } t = t + 1; // move to the next position in the sequence } return(0); }
/* The C-style pseudo code for generating output tokens */ int sequence(){ { while (state[t] = M1){ rnd = unifRand(); if (0.00 < rnd <= 0.10) letter[t] = A; if (0.10 < rnd <= 0.90) letter[t] = C; if (0.90 < rnd <= 0.94) letter[t] = G; if (0.94 < rnd <= 1.00) letter[t] = T; } return(0); }