bioinformatics-course/Notebook.org

1.8 KiB
Raw Blame History

Biology Meets Programming: Bioinformatics for Beginners

Week 1

DNA replication

Origin of replication (ori)

Locating an ori is key for gene therapy (e.g. viral vectors), to introduce a theraupetic gene.

Exercises: computational approaches to find ori in Vibrio Cholerae
Exercise: find Pattern

We'll look for the DnaA box sequence, using a sliding window, in that case we will use the function Replication to find out how many times does a sequence appear in the genome.

For the second part, we're going to calculate the frequency map of the sequences of length k, for that purpose we'll use FrequentWords

Exercise: Find the reverse complement of a sequence

We're going to generate the reverse complement of a sequence, which is the complement of a sequence, read in the same direction (5' -> 3'). In this case, we're going to use ReverseComplement After using our function on the Vibrio Cholerae's genome, we realize that some of the frequent k-mers are reverse complements of other frequent ones.

Exercise: Find a subsequence within a sequence

We're going to find the ocurrences of a subsquence inside a sequence, and save the index of the first letter in the sequence. This time, we'll use PatternMatching After using our function on the Vibrio Cholerae's genome, we find out that the 9-mers with the highest frequency appear in cluster. This is strong statistical evidence that our subsequences are DnaA boxes.

Vocabulary

  • k-mer: subsquences of length k in a biological sequence
  • Frequency map: sequence > frequency of the sequence