Add PatternMatching function
This commit is contained in:
		
							parent
							
								
									f5609b5577
								
							
						
					
					
						commit
						dc740e1c54
					
				
							
								
								
									
										6
									
								
								Code/PatternMatching.py
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										6
									
								
								Code/PatternMatching.py
									
									
									
									
									
										Normal file
									
								
							| @ -0,0 +1,6 @@ | |||||||
|  | def PatternMatching(Pattern, Genome): | ||||||
|  |     positions = [] | ||||||
|  |     for i in range(len(Genome)-len(Pattern)+1): | ||||||
|  |         if Genome[i:i+len(Pattern)] == Pattern: | ||||||
|  |             positions.append(i) | ||||||
|  |     return positions | ||||||
| @ -21,8 +21,14 @@ | |||||||
|       |       | ||||||
|       We're going to generate the reverse complement of a sequence, which is the complement of a sequence, read in the same direction (5' -> 3'). |       We're going to generate the reverse complement of a sequence, which is the complement of a sequence, read in the same direction (5' -> 3'). | ||||||
|       In this case, we're going to use [[./Code/ReverseComplement.py][ReverseComplement]]  |       In this case, we're going to use [[./Code/ReverseComplement.py][ReverseComplement]]  | ||||||
|       After using our function on the Vibrio's Cholerae genome, we realize that some of the frequent k-mers are reverse complements of other frequent ones. |       After using our function on the Vibrio Cholerae's genome, we realize that some of the frequent k-mers are reverse complements of other frequent ones. | ||||||
|       |       | ||||||
|  | ***** Exercise: Find a subsequence within a sequence | ||||||
|  |        | ||||||
|  |       We're going to find the ocurrences of a subsquence inside a sequence, and save the index of the first letter in the sequence. | ||||||
|  |       This time, we'll use [[./Code/PatternMatching.py][PatternMatching]]  | ||||||
|  |       After using our function on the Vibrio Cholerae's genome, we find out that the /9-mers/ with the highest frequency appear in cluster. | ||||||
|  |       This is strong statistical evidence that our subsequences are /DnaA boxes/. | ||||||
| 
 | 
 | ||||||
| *** Vocabulary | *** Vocabulary | ||||||
|       - k-mer: subsquences of length /k/ in a biological sequence |       - k-mer: subsquences of length /k/ in a biological sequence | ||||||
|  | |||||||
		Loading…
	
	
			
			x
			
			
		
	
		Reference in New Issue
	
	Block a user