The Needleman-Wunsch Algorithm

Chapter: The Needleman-Wunsch Algorithm

We are going to develop a recurrence relation for the score of the best alignment between two sequences a1, a2, .., an and b1, b2, ..,bm using the principle of Mathematical Induction.

This is a common, useful and reliable way to develop algorithms. We will write A(k,l) to denote the best alignment score between the prefix sequences a1, a2, .., ak and b1, b2, ..,bl. So, assume that we know A(k,l) for every pair (k,l) that precedes pair (i,j). For our purposes, we will say that (k,l) precedes (i,j) if either (k<i and l<=j) or (l<j and k<=i). In other words, if you think of (i,j) and (k,l) as points in the Cartesian plane, then (k,l) precedes (i,j) if it is to below and to the left. At most one of "below" and "to the left" can be replaced by equality.

We proceed to derive a recursive equation for A(i,j).

How does the best alignment for a1, a2, .., ai and b1, b2, ..,bj relate to predecessors? There are three possible scenarios:

Pictorially, these three cases look like: where we've used x and y to denote ai and bj.

Now let's do some analysis for each of these cases:

Since we want the highest possible score, we must choose the case that leads to the largest value for the calculated new alignment. A(i,j) is thus the maximum of

We have derived the recurrence

A(i,j) = MaxA(i-1,j-1)+s(ai,bj), A(i,j-1)+g, A(i-1,j)+g

This immediately suggest the program portion:

int A(int i, int j) {
  if ...          // ... denotes our yet to be determined base cases
  then return ... // to be determined
  else return max(A(i-1,j-1)+s(a[i],b[j]), A(i,j-1)+g, A(i-1,j)+g);
}

In view of what we've just seen, look carefully at my program NW.java. It is a simple implementation of Needleman-Wunsch. Next lab, you will be modifying and extending it. This lab, you'll run it to check your hand-derived arrays. For the next exercises, at least one member of the team should work on producing the matrixes by hand, and at least one should adapt/run the program to check the handiwork of their team members.

This program outputs what we'll call a "dynamic programming matrix" that can be used to produce a corresponding alignment (We'll do that next lab). The next exercises expect you to generate small dynamic programming matrices by hand and check them by program.


Exercise 6

Produce dynamic programming matrices for ACCTGCTAC and TCCAGCTTC using
  1. 4 for a match, -1 for a mismatch, -2 for an indel
  2. Check your calculations by running NW.java.
  3. 5 for a match, 0 for a mismatch, -4 for an indel
  4. Check your calculations by modifying NW.java (three small changes is all you need) and running it.
Deliverables: Show me your matrixes and the outputs from the programs.

The distance measure is significantly different. It needs a minimization of the northwest, north and west - derived scores. You will need to change more than just three lines of NW.java to answer the next (and final (whew!!)) exercise.


Exercise 7

Produce a dynamic programming matrix for ACCTGCTAC and TCCAGCTTC using the distance measure (0 for a match, +1 for either a mismatch or an indel). Check your answer by modifying NW.java and running it.

Deliverable: Show your matrix.



rhyspj@gwu.edu