Previous Exams
Alignment
20
a. Complete the table and find the optimal alignment and its score from the table. b. Find the optimal multiple global alignment between v, w, and x, under the sum-of-pairs scoring scheme c. In what order would the Progressive multiple alignment algorithm merge the three strings? d. Create an efficient algorithm that finds the best alignment between a suffix of v and a suffix of w.
19
a. Fill in the global alignment dynamic programming matrix between v and w. What is the optimal alignment? b. Would the k-difference global alignment with k = 2 find the optimal solution? c. Create an efficient algorithm that finds the best alignment between a prefix of v and a prefix of w. d. Create an efficient algorithm that counts the number of additional comparison operations needed for a recursive global alignment solution compared with a dynamic programming solution.
18
a. Complete the table and find the optimal alignment. Is the alignment local or global? b. What is the edit distance between v and w? c. Create an efficient algorithm for computing the palindromic score of v. d. Create an efficient algorithm for finding the longest palindromic substring within an input string v.
17
a. Complete the table and find the optimal alignment and its score. b. Make minimal changes to the pseudocode so that it instead takes a local alignment table and prints the resulting local alignment. c. Modify the pseudocode so that it prints all optimal local alignments.
16
a. Fill in the dynamic programming table for a global alignment of v and w. Find the optimal alignment and its score from the table. b. Define what we mean by a local alignment of two sequences v and w. Describe the changes we need to do. c. Describe how you can modify the dynamic programming algorithm to find the highest scoring alignment for searching for an unknown common adapter fragment.
15
a. Complete the table and find the optimal alignment and its score from the table. b. Show how this instance of the ED problem can be solved by the space-efficient ED algorithm c. Give an O(nm) algorithm which computes the matrix whose (i, j)th entry is the score of the optimal global alignment in which the character vi is aligned with the character wj.
14
a. Complete the table and find the optimal alignment and its score from the table. b. Find the optimal multiple global alignment under the sum-of-pairs scoring scheme. c. Create an efficient algorithm for finding the best alignment between a suffix of v and a suffix of w.
Heuristic alignment
18
a. What is the length of the shortest string query that your index is guaranteed to find if you allow 3 mismatches between queries and hits in T? b. How would changing your index into a 12mer index affect the index’s speed, sensitivity, and specificity for finding query occurrences with up to 3 mismatches in T?
String indexes
20
a. Draw the suffix tree. Add suffix links and show the edge label compression. b. Ukkonen’s algorithm. c. Show how to compute the matching statistics for the first three positions in the string
19
a. Draw the suffix tree. Add suffix links and show the edge label compression. b. Ukkonen’s algorithm: At which phases of the algorithm is the algorithm applying suffix extension rule 3? c. ? d. Describe an efficient algorithm to find the length of the longest proper substring of P that ends at i and matches a prefix of P. e. Describe an efficient algorithm to find the smallest substring of p that occurs in p exactly k times.
18
a. Draw the keyword tree. Add the tree’s non-trivial failure links. b. Draw the generalized suffix tree and add the suffix links. c. Given the generalized suffix tree for a set of strings, create an efficient algorithm for converting it into a keyword tree. d. Describe an efficient algorithm for solving the DNA contamination problem. e. Can the DNA contamination problem can be solved in linear time and in space proportional to min(|S1|, |S2|)?
17
a. Draw the suffix tree for the string "agagaa". Add suffix links and show the edge label compression. b. Is this algorithm correct? c. Create a fast algorithm for computing the minimum length of all the pairwise longest common prefixes over all the pairs of strings. d. Describe an efficient algorithm that for each prefix si of s returns the longest repeated substring within si.
16
a. Draw the suffix tree. Add suffix links and show the edge label compression. b. Describe an efficient algorithm for finding every repeated substring of S. c. Describe an O(n) algorithm for finding all maximal repeated substrings.
15
a. Draw the suffix tree. Add suffix links and show the edge label compression. b. Ukkonen’s algorithm: At which phases of the algorithm is the current implicit suffix tree also a true suffix tree? c. Give a family of strings S of possible infinite length such that its implicit suffix tree only has one root and two leaves. d. Describe an algorithm that for any length l finds the most frequently repeated substring e. Show how to find all possible contaminants in S by using a suffix tree of S.
14
a. Draw the suffix tree. Add suffix links and show the edge label compression. b. With regard to a string s over an alphabet Σ, what is the degree of the root of the suffix tree (T ) of s (T (s))? c. Give an infinite family of strings over a finite alfabet such that the total length of the edge-labels on their suffix trees is Ω(s2), where s is the length of the string. d. Create an algorithm that finds set of all substrings of s1 that are not contained in s2 in O(n) time.
Hidden Markov model
20
a. Draw the state diagram for this Markov Chain. b. What is the most likely state for the chain at t = 100? c. We extend the Markov Chain into an HMM. Explain what this HMM models. d. List all possible state paths Q that can generate O. Calculate the joint probability Pr(Q & O) for two of the paths. e. What is the probability of the observed sequence? f. What is the most probable path from the HMM?
19
a. Draw the state diagram for this Markov Chain. b. Is the Markov Chain aperiodic? If no, describe its periodicity. c. We extend this Markov Chain into an HMM. Explain what this HMM models. d. List all possible state paths Q that can generate O. Calculate the joint probability Pr(Q & O) for two of the paths. e. Apply the Viterbi algorithm to find Pr(Q & O) for the most probable path.
18
a. Write down the transition probability matrix for this Markov Chain. b. We extend the Markov Chain into an HMM. List all possible state paths Q that can generate O. Calculate the joint probability Pr(Q & O) for two of the paths. c. What is the probability of observing the following symbol sequence? d. Observed O = CCC_N_AC. Describe an efficient algorithm that determines the most likely value of N.
17
a. Draw the state diagram for this Markov Chain. b. We extend the Markov Chain into an HMM. List all possible state paths Q that can generate O. Calculate the joint probability Pr(Q & O) for two of the paths. c. Apply the Viterbi algorithm to find Pr(Q & O) for the most probable path. d. Explain how the HMM can be modified to reflect that AT substrings are more frequent than TA substrings in the strings recognized by the HMM.
16
a. Write down the transition probability matrix P for this HMM. Describe the motifs that this HMM can generate. b. List all possible state paths Q that can generate O. Calculate the joint probability Pr(Q & O) for two of the paths. c. Apply the Viterbi algorithm to find Pr(Q & O) for the most probable path. What is the general computational complexity in time and space? d. What is the expected fraction of the sequence containing motifs? What is the expected number of such motif occurrences?
15
a. Draw the state diagram for this Markov Chain. What is the probability of being in state 2 at times t − 1, t, and t + 1? b. In which states are you most likely to find each of the two chains after 10 transitions? c. We extend the Markov Chain into an HMM. List all possible state paths Q that can generate O. Calculate the joint probability Pr(Q & O) for two of the paths. d. Apply the Viterbi algorithm to find Pr(Q & O) for the most probable path.
14
a. Draw the state diagram for this Markov Chain. What is the probability of being in state 0 at time t? b. We extend the Markov Chain into an HMM. Explain what this HMM models. c. List all possible state paths Q that can generate O. Calculate the joint probability Pr(Q & O) for two of the paths. d. Apply the Viterbi algorithm to find Pr(Q & O) for the most probable path.
Motifs
20
a. Find the Position Weight Matrix (PWM) for the multiple alignment. What is the consensus motif and its consensus score? b. Show how the PWM can be represented as an HMM c. Given the PWM, create a ROC-graph that shows the performance of the PWM at correctly classifying P and N
17
a. Construct a PWM (profile) from the following multiple alignment. What is the PWM’s consensus sequence and consensus score? b. Describe an effective algorithm that uses d to measure the distance between two PWMs.
Sequencing by Hybridization (SBH)
19
a. Permissable graph for the SBH Euler-path–based assembly algorithm. b. Complete the vertex labels so that G is permissable for the SBH Euler-path–based assembly algorithm. Determine the measured Spectrum.
18
a. Create an efficient algorithm that finds all such sequence pairs. b. Modify your algorithm so that it can handle the expected number of sequencing errors and still identify sequence pairs with at least 50% overlap.
15
a. Change the multiplicities of edges in E so that the resulting multigraph is a permissable graph for the SBH Euler-path–based assembly algorithm. b. Complete the vertex labels so that G is permissable for the SBH Euler-path–based assembly algorithm. Determine the measured Spectrum. c. Specify the next steps of the SBH Euler-path–based assembly algorithm after G has been constructed. d. Determine all strings that can be output by the algorithm
Euler path
16
a. What is an Euler path? Give an effcient algorithm to determine whether a directed connected graph has a unique Euler path.
Palindromes
14
a. Describe an efficient algorithm that determines if a given string s is a palindrome. b. Describe an efficient algorithm that computes the minimum number of edit operations needed to turn an arbitrary string s into a palindrome.