Finish Week 1 tasks
This commit is contained in:
parent
dc740e1c54
commit
aedcf5930f
17
Notebook.org
17
Notebook.org
@ -8,7 +8,7 @@
|
|||||||
|
|
||||||
Locating an ori is key for gene therapy (e.g. viral vectors), to introduce a theraupetic gene.
|
Locating an ori is key for gene therapy (e.g. viral vectors), to introduce a theraupetic gene.
|
||||||
|
|
||||||
**** Exercises: computational approaches to find ori in Vibrio Cholerae
|
**** Computational approaches to find ori in Vibrio Cholerae
|
||||||
|
|
||||||
***** Exercise: find Pattern
|
***** Exercise: find Pattern
|
||||||
|
|
||||||
@ -21,15 +21,26 @@
|
|||||||
|
|
||||||
We're going to generate the reverse complement of a sequence, which is the complement of a sequence, read in the same direction (5' -> 3').
|
We're going to generate the reverse complement of a sequence, which is the complement of a sequence, read in the same direction (5' -> 3').
|
||||||
In this case, we're going to use [[./Code/ReverseComplement.py][ReverseComplement]]
|
In this case, we're going to use [[./Code/ReverseComplement.py][ReverseComplement]]
|
||||||
After using our function on the Vibrio Cholerae's genome, we realize that some of the frequent k-mers are reverse complements of other frequent ones.
|
After using our function on the /Vibrio Cholerae's/ genome, we realize that some of the frequent k-mers are reverse complements of other frequent ones.
|
||||||
|
|
||||||
***** Exercise: Find a subsequence within a sequence
|
***** Exercise: Find a subsequence within a sequence
|
||||||
|
|
||||||
We're going to find the ocurrences of a subsquence inside a sequence, and save the index of the first letter in the sequence.
|
We're going to find the ocurrences of a subsquence inside a sequence, and save the index of the first letter in the sequence.
|
||||||
This time, we'll use [[./Code/PatternMatching.py][PatternMatching]]
|
This time, we'll use [[./Code/PatternMatching.py][PatternMatching]]
|
||||||
After using our function on the Vibrio Cholerae's genome, we find out that the /9-mers/ with the highest frequency appear in cluster.
|
After using our function on the /Vibrio Cholerae's/ genome, we find out that the /9-mers/ with the highest frequency appear in cluster.
|
||||||
This is strong statistical evidence that our subsequences are /DnaA boxes/.
|
This is strong statistical evidence that our subsequences are /DnaA boxes/.
|
||||||
|
|
||||||
|
|
||||||
|
**** Computational approaches to find ori in any bacteria
|
||||||
|
|
||||||
|
Now that we're pretty confident about the /DnaA boxes/ sequences that we found, we are going to check if they are a common pattern in the rest of bacterias.
|
||||||
|
We're going to find the ocurrences of the sequences in /Thermotoga petrophila/ using [[./Code/Replication.py][Replication]]
|
||||||
|
|
||||||
|
After the execution, we observe that there are *no* ocurrences of the sequences found in /Vibrio Cholerae/.
|
||||||
|
We can conclude that different bacterias have different /DnaA boxes/.
|
||||||
|
|
||||||
|
We have to try another computational approach then, find clusters of /k-mers/ repeated in a small interval.
|
||||||
|
|
||||||
*** Vocabulary
|
*** Vocabulary
|
||||||
- k-mer: subsquences of length /k/ in a biological sequence
|
- k-mer: subsquences of length /k/ in a biological sequence
|
||||||
- Frequency map: sequence --> frequency of the sequence
|
- Frequency map: sequence --> frequency of the sequence
|
||||||
|
Loading…
Reference in New Issue
Block a user