bioinformatics-course/Notebook.org
2019-10-21 19:21:49 +02:00

2.5 KiB
Raw Blame History

Biology Meets Programming: Bioinformatics for Beginners

Week 1

DNA replication

Origin of replication (ori)

Locating an ori is key for gene therapy (e.g. viral vectors), to introduce a theraupetic gene.

Computational approaches to find ori in Vibrio Cholerae
Exercise: find Pattern

We'll look for the DnaA box sequence, using a sliding window, in that case we will use the function Replication to find out how many times does a sequence appear in the genome.

For the second part, we're going to calculate the frequency map of the sequences of length k, for that purpose we'll use FrequentWords

Exercise: Find the reverse complement of a sequence

We're going to generate the reverse complement of a sequence, which is the complement of a sequence, read in the same direction (5' -> 3'). In this case, we're going to use ReverseComplement After using our function on the Vibrio Cholerae's genome, we realize that some of the frequent k-mers are reverse complements of other frequent ones.

Exercise: Find a subsequence within a sequence

We're going to find the ocurrences of a subsquence inside a sequence, and save the index of the first letter in the sequence. This time, we'll use PatternMatching After using our function on the Vibrio Cholerae's genome, we find out that the 9-mers with the highest frequency appear in cluster. This is strong statistical evidence that our subsequences are DnaA boxes.

Computational approaches to find ori in any bacteria

Now that we're pretty confident about the DnaA boxes sequences that we found, we are going to check if they are a common pattern in the rest of bacterias. We're going to find the ocurrences of the sequences in Thermotoga petrophila using Replication

After the execution, we observe that there are no ocurrences of the sequences found in Vibrio Cholerae. We can conclude that different bacterias have different DnaA boxes.

We have to try another computational approach then, find clusters of k-mers repeated in a small interval.

Vocabulary

  • k-mer: subsquences of length k in a biological sequence
  • Frequency map: sequence > frequency of the sequence