From aedcf5930f459e2f9bdd8730d5ae2ed2e7be51f3 Mon Sep 17 00:00:00 2001 From: coolneng Date: Mon, 21 Oct 2019 19:21:49 +0200 Subject: [PATCH] Finish Week 1 tasks --- Notebook.org | 17 ++++++++++++++--- 1 file changed, 14 insertions(+), 3 deletions(-) diff --git a/Notebook.org b/Notebook.org index 6896000..01619f2 100644 --- a/Notebook.org +++ b/Notebook.org @@ -8,7 +8,7 @@ Locating an ori is key for gene therapy (e.g. viral vectors), to introduce a theraupetic gene. -**** Exercises: computational approaches to find ori in Vibrio Cholerae +**** Computational approaches to find ori in Vibrio Cholerae ***** Exercise: find Pattern @@ -21,15 +21,26 @@ We're going to generate the reverse complement of a sequence, which is the complement of a sequence, read in the same direction (5' -> 3'). In this case, we're going to use [[./Code/ReverseComplement.py][ReverseComplement]] - After using our function on the Vibrio Cholerae's genome, we realize that some of the frequent k-mers are reverse complements of other frequent ones. + After using our function on the /Vibrio Cholerae's/ genome, we realize that some of the frequent k-mers are reverse complements of other frequent ones. ***** Exercise: Find a subsequence within a sequence We're going to find the ocurrences of a subsquence inside a sequence, and save the index of the first letter in the sequence. This time, we'll use [[./Code/PatternMatching.py][PatternMatching]] - After using our function on the Vibrio Cholerae's genome, we find out that the /9-mers/ with the highest frequency appear in cluster. + After using our function on the /Vibrio Cholerae's/ genome, we find out that the /9-mers/ with the highest frequency appear in cluster. This is strong statistical evidence that our subsequences are /DnaA boxes/. + +**** Computational approaches to find ori in any bacteria + + Now that we're pretty confident about the /DnaA boxes/ sequences that we found, we are going to check if they are a common pattern in the rest of bacterias. + We're going to find the ocurrences of the sequences in /Thermotoga petrophila/ using [[./Code/Replication.py][Replication]] + + After the execution, we observe that there are *no* ocurrences of the sequences found in /Vibrio Cholerae/. + We can conclude that different bacterias have different /DnaA boxes/. + + We have to try another computational approach then, find clusters of /k-mers/ repeated in a small interval. + *** Vocabulary - k-mer: subsquences of length /k/ in a biological sequence - Frequency map: sequence --> frequency of the sequence