COSC 348: Lab05

Hidden Markov Models (HMMs)

IMPORTANT: submit the code and its output, i.e. 20 generated and scored sequences. Worth 1%.
If you do not finish the coding during the lab, you can continue at home and submit later by the end of the week.


Coding task

Write a code that will generate and score 20 random DNA sequences based on the HMM provided below. A random sequence will correspond to a random walk starting on the "Start" state and finishing on the "End" state.
Di, Ii, and Mi are the deletion, insertion and match states respectively. The transitions (arrows) from a state q to a state r will have a probability T(r|q) which can be found in the Figure. These transition probabilities will favour some paths more than others.

The following Table lists the probability distributions P(x|q) of the DNA bases in state q.

ACGT
M10.100.800.040.06
M20.400.550.040.01
M30.730.180.020.07
I10.150.290.330.23
I20.440.510.020.03
I30.700.080.190.03
I40.010.010.080.90

Note: You may want to use logarithms for scoring your random sequences.



/* The C-style pseudo code for state transitions */

double unifRand() {    
			return rand() / double(RAND_MAX);
} 


int main(){    
	seed();  // new random seed
	state[0] = Start;    
	rnd = unifRand();  // generate a random number between 0 and 1 according to uniform distribution


	if (0.0 < rnd <= 0.3) state[1] = D1;
    if (0.3 < rnd <= 0.7) state[1] = I1;
    if (0.7 < rnd <= 1.0) state[1] = M1;   
                 	
	until (state[t] = End){        
		rnd = unifRand();       
		// procedure for transition to a new state comes here

		if(state[t] = M1){
			rnd = unifRand(); 
			// procedure to generate output token using M probabilities comes here
                    }      
 
   		 if(state[t] = I1){
			rnd = unifRand(); 
			// procedure to generate output token using I probabilities comes here
				}
 		
		if(state[t] = D1){
			return "-"; // return gap
                    }

		t = t + 1; // move to the next position in the sequence
	}    
return(0);
} 
    
/* The C-style pseudo code for generating output tokens */

int sequence(){  
{     
	while (state[t] = M1){        
		rnd = unifRand();        
		if (0.00 < rnd <= 0.10) letter[t] = A;            
		if (0.10 < rnd <= 0.90) letter[t] = C;
 		if (0.90 < rnd <= 0.94) letter[t] = G;            
		if (0.94 < rnd <= 1.00) letter[t] = T;
	}    
return(0);
}


Cosc348 home
Cosc348 labs