We are going to develop a recurrence relation for the score of the best alignment between two sequences and using the principle of Mathematical Induction.
This is a common, useful and reliable way to develop algorithms. We will write to denote the best alignment score between the prefix sequences and . So, assume that we know for every pair that precedes pair . For our purposes, we will say that precedes if either or . In other words, if you think of and as points in the Cartesian plane, then precedes if it is to below and to the left. At most one of "below" and "to the left" can be replaced by equality.
We proceed to derive a recursive equation for .
How does the best alignment for and relate to predecessors? There are three possible scenarios:
PREVIOUS x ALIGMENT y
PREVIOUS - ALIGMENT y
PREVIOUS x ALIGMENT -
Now let's do some analysis for each of these cases:
PREVIOUS x ALIGMENT yIn this case, the best previous alignment is (using our inductive hypothesis) . We will add to that either the MATCH score (if ) or the MISMATCH score (otherwise). Let's denote that additional amount by , or -- removing our abbreviation -- . In this case the calculated new alignment score would be .
PREVIOUS - ALIGMENT yIn this case, the best previous alignment is (again using the induction hypothesis) . We need to add on the gap penalty. Let's denote the gap penalty by . In this case therefore the calculated new alignment score would be .
PREVIOUS x ALIGMENT -By a similar argument, the calculated new alignment score in this case is
Since we want the highest possible score, we must choose the case that leads to the largest value for the calculated new alignment. is thus the maximum of
We have derived the recurrence
This immediately suggest the program portion:
int A(int i, int j) { if ... // ... denotes our yet to be determined base cases then return ... // to be determined else return max(A(i-1,j-1)+s(a[i],b[j]), A(i,j-1)+g, A(i-1,j)+g); }
In view of what we've just seen, look carefully at my program NW.java. It is a simple implementation of Needleman-Wunsch. Next lab, you will be modifying and extending it. This lab, you'll run it to check your hand-derived arrays. For the next exercises, at least one member of the team should work on producing the matrixes by hand, and at least one should adapt/run the program to check the handiwork of their team members.
This program outputs what we'll call a "dynamic programming matrix" that can be used to produce a corresponding alignment (We'll do that next lab). The next exercises expect you to generate small dynamic programming matrices by hand and check them by program.
The distance measure is significantly different. It needs a minimization of the northwest, north and west - derived scores. You will need to change more than just three lines of NW.java to answer the next (and final (whew!!)) exercise.
Deliverable: Show your matrix.