2.5 KiB
Biology Meets Programming: Bioinformatics for Beginners
Week 1
DNA replication
Origin of replication (ori)
Locating an ori is key for gene therapy (e.g. viral vectors), to introduce a theraupetic gene.
Computational approaches to find ori in Vibrio Cholerae
Exercise: find Pattern
We'll look for the DnaA box sequence, using a sliding window, in that case we will use the function Replication to find out how many times does a sequence appear in the genome.
For the second part, we're going to calculate the frequency map of the sequences of length k, for that purpose we'll use FrequentWords
Exercise: Find the reverse complement of a sequence
We're going to generate the reverse complement of a sequence, which is the complement of a sequence, read in the same direction (5' -> 3'). In this case, we're going to use ReverseComplement After using our function on the Vibrio Cholerae's genome, we realize that some of the frequent k-mers are reverse complements of other frequent ones.
Exercise: Find a subsequence within a sequence
We're going to find the ocurrences of a subsquence inside a sequence, and save the index of the first letter in the sequence. This time, we'll use PatternMatching After using our function on the Vibrio Cholerae's genome, we find out that the 9-mers with the highest frequency appear in cluster. This is strong statistical evidence that our subsequences are DnaA boxes.
Computational approaches to find ori in any bacteria
Now that we're pretty confident about the DnaA boxes sequences that we found, we are going to check if they are a common pattern in the rest of bacterias. We're going to find the ocurrences of the sequences in Thermotoga petrophila using Replication
After the execution, we observe that there are no ocurrences of the sequences found in Vibrio Cholerae. We can conclude that different bacterias have different DnaA boxes.
We have to try another computational approach then, find clusters of k-mers repeated in a small interval.