# I. Perspective

• Dynamic programming is an optimization technique.

• Greedy vs. Dynamic Programming :

• Both techniques are optimization techniques, and both build solutions from a collection of choices of individual elements.

• The greedy method computes its solution by making its choices in a serial forward fashion, never looking back or revising previous choices.

• Dynamic programming computes its solution bottom up by synthesizing them from smaller subsolutions, and by trying many possibilities and choices before it arrives at the optimal set of choices.

• There is no a priori litmus test by which one can tell if the Greedy method will lead to an optimal solution.

• By contrast, there is a litmus test for Dynamic Programming, called The Principle of Optimality

• Divide and Conquer vs. Dynamic Programming:

• Both techniques split their input into parts, find subsolutions to the parts, and synthesize larger solutions from smalled ones.

• Divide and Conquer splits its input at prespecified deterministic points (e.g., always in the middle)

• Dynamic Programming splits its input at every possible split points rather than at a pre-specified points. After trying all split points, it determines which split point is optimal.

# II. Principle of Optimality

• Definition: A problem is said to satisfy the Principle of Optimality if the subsolutions of an optimal solution of the problem are themesleves optimal solutions for their subproblems.
• Examples:

• The shortest path problem satisfies the Principle of Optimality.

• This is because if a,x1,x2,...,xn,b is a shortest path from node a to node b in a graph, then the portion of xi to xj on that path is a shortest path from xi to xj.

• The longest path problem, on the other hand, does not satisfy the Principle of Optimality. Take for example the undirected graph G of nodes a, b, c, d, and e, and edges (a,b) (b,c) (c,d) (d,e) and (e,a). That is, G is a ring. The longest (noncyclic) path from a to d to a,b,c,d. The sub-path from b to c on that path is simply the edge b,c. But that is not the longest path from b to c. Rather, b,a,e,d,c is the longest path. Thus, the subpath on a longest path is not necessarily a longest path.

# III. Steps of Dynamic Programming

• Dynamic programming design involves 4 major steps:

1. Develop a mathematical notation that can express any solution and subsolution for the problem at hand.

2. Prove that the Principle of Optimality holds.

3. Develop a recurrence relation that relates a solution to its subsolutions, using the math notation of step 1. Indicate what the initial values are for that recurrenec relation, and which term signifies the final solution.

4. Write an algorithm to compute the recurrence relation.

• Steps 1 and 2 need not be in that order. Do what makes sense in each problem.

• Step 3 is the heart of the design process. In high level algorithmic design situations, one can stop at step 3. In this course, however, we will carry out step 4 as well.

• Without the Principle of Optimality, it won't be possible to derive a sensible recurrence relation in step 3.

• When the Principle of Optimality holds, the 4 steps of DP are guaranteed to yield an optimal solution. No proof of optimality is needed.

# IV. First Application: The Matrix Chain Problem

• Input: n matrices A1, A2,...,An of dimensions
P1 x P2, P2 x P3, ... , Pn x Pn+1, respectively.

• Goal: to compute the matrix product A1A2...An

• Problem: In what order should A1A2...An be multiplied so that it would take the minimum number of computations to derive the product.

• Matrix multiplication cost:

• Let A and B be two matrices of dimensions p x q and q x r.

• Let C= AB. C is of diemnsions p x r

• Element Cij= Ai1B1j + Ai2B2j + ... + AiqBqj

• Thus Cij takes n scalar multiplications and (n-1) scalar additions.

• Since scalar multiplication is more expensive than scalar addition, we count only the scalar multiplications.

• Thus Cij takes n multiplications.

• Therefore, AB takes p x r x q = pqr multiplications.

• Example of the best way of multiplying 3 matrices:

• takes A1 of dimensions 3 x 5, A2 of dimensions 5 x 7, and A3 of dimensions 7 x 2.

• (A1A2)A3 takes 3*5*7 + 3*7*2=147

• A1(A2A3) takes 5*7*2 + 3*5*2=100

• Thus, A1(A2A3) is much cheaper to compute than (A1A2)A3, although Both lead to the same final answer.

• Exercise: Formulate a greedy method for the Matrix Chain Problem, and prove by a counter example that it does not necessarily lead to an optimal solution.

• A Dynamic programming design:

• Notation: let Mij denote the cost of multiplying Ai...Aj, where the cost is measured in the number of scalar multiplications.

Clearly, M(i,i)=0 for all i, and M(1,n) is what we are looking for.

• Proof of the principle of optimality

• Every way of multiplying a sequence of matrices can be represented by a binary (infix) tree, where the leaves are the matrices, and the internal nodes are intemediary products.

• Let T be the tree correspodning to the optimal way of multiplying Ai...Aj.

• T has a left subtree L and a right subtree R. L corresponds to multiplying B=Ai...Ak, and R to multiplying C=Ak+1 ... Aj, for some integer k (i <= k <= j-1).

• The cost corresponding to T is
cost(T)= cost(R) + Cost(L) + cost(BC).

• for the Principle of Optimality to hold, we need to show that L is the best tree of Ai...Ak, and R is the best tree for Ak+1 ... Aj. It suffices to show it for L.

• The proof is by contradiction. If L were not optimal, then there would be a better tree L' for Ai...Ak.

• Cost(L') < Cost(L).

• Then, takes the tree T' whose left subtree is L' and whose right subtree is R.

• ```Cost(T') = Cost(L') + Cost(R) + Cost(BC)
< Cost(L) + Cost(R) + Cost(BC) = Cost(T)
```

• Thus, Cost(T') < cost(T), which contradicts the fact that T was the best tree for its matrices.

• Therefore, L must be optimal. Q.E.D.

• Derivation of the recurrence relation:

• use the same notation T, L and R for the optimal way of multiplying Ai...Aj. L is the left subtree corresponding to Ai...Ak (for some k), and R corresponds to Ak+1...An.

• ```
Mij=cost(T)
=Cost(L)+Cost(R)+Cost(BC)
=Mik + Mk+1,j + PiPk+1Pj+1.
```

• Since we do not know the right value of k, and since Mij is supposed to be the minimum possible, the relation should be
Mij=min{Mik + Mk+1,j + PiPk+1Pj+1 | i <= k <= j-1}

• Illustration:

• n=4. A1 is 3 x 5, A2 is 5 x 7, A3 is 7 x 3, and A4 is 3 x 4

• P1=3, P2=5, P3=7, P4=3, P5=4

• We will build a table to compute the Mijs bottom up. For each Mij, we record the k that gives Mij its min vakue

•  M11=0 M22=0 M33=0 M44=0 M12=105 M23=105 M34=84 M13=150 k=1 M24=165 k=3 M14=186 k=3

• Optimal multiplication way: (A1(A2A3))A4.

# V. Second Application: The All-Pairs Shortest Path Problem

• Input: A weighted graph, represented by its weight matrix W.

• Problem: find the distance between every pair of nodes.

• Dynamic programming Design:

• Notation: A(k)(i,j) = length of the shortest path from node i to node j where the label of every intermediary node is <= k.

A(0)(i,j) = W[i,j].

• Principle of Optimality: We already saw that any sub-path of a shortest path is a shortest path between its end nodes.

• recurrence relation:

• Divide the paths from i to j where every intermediary node is of label <=k into two groups:

1. Those paths that do go through node k

2. Those paths that do not go through node k.

• the shortest path in the first group is the shortest path from i to j where the label of every intermediary node is <= k-1.

• therefore, the length of the shortest path of group 1 is A(k-1)(i,j)

• Each path in group two consists of two portions: The first is from node i to node k, and the second is from node k to node j.

• the shortest path in group 2 does not goe through K more than once, for otherwise, the cycle around K can be eliminated, leading to a shorter path in group 2.

• Therefore, the two portions of the shortest path in group 2 have their intermediary labels <= k-1.

• Each portion must be the shortest of its kind. That is, the portion from i to k where intermediary node is <= k-1 must be the shortest such a path from i to k. If not, we would get a shorter path in group 2. Same thing with the second portion (from j to k).

• Therefore, the length of the first portion of the shortest path in group 2 is A(k-1)(i,k)

• Therefore, the length of the 2nd portion of the shortest path in group 2 is A(k-1)(k,j)

• Hence, the length of the shortest path in group 2 is A(k-1)(i,k) + A(k-1)(k,j)

• Since the shortest path in the two groups is the shorter of the shortest paths of the two groups, we get

• A(k)(i,j)=min(A(k-1)(i,j), A(k-1)(i,k) + A(k-1)(k,j)).

• The algorithm follows:
```

Procedure APSP(input: W[1:n,1:n];A[1:n,1:n])
begin
for i=1 to n do
for j=1 to n do
A(0)(i,j) := W[i,j];
endfor
endfor
for k=1 to n do
for i=1 to n do
for j=1 to n do
A(k)(i,j)=min(A(k-1)(i,j),A(k-1)(i,k) + A(k-1)(k,j))
endfor
endfor
endfor
end

```

• Note that once A(k) has been computed, there is no need for A(k-1)

• Therefore, we don't need to keep the superscript

• By dropping it, the algorithm remains correct, and we save on space

• the new implementation follows:
```

Procedure APSP(input: W[1:n,1:n];A[1:n,1:n])
begin
for i=1 to n do
for j=1 to n do
A(i,j) := W[i,j];
endfor
endfor
for k=1 to n do
for i=1 to n do
for j=1 to n do
A(i,j)=min(A(i,j),A(i,k) + A(k,j));
endfor
endfor
endfor
end

```

• Time Complexity Analysis:

• The first double for-loop takes O(n2) time.

• The tripl-for-loop that follows has a constant-time body, and thus takes O(n3) time.

• Thus, the whole algorithm takes O(n3) time.

# VI. Third Application: Optimal Binary Search Trees

• Input: a1 < a2 < ... < an
p1 p2 ... pn
q0 q1 q2 ... qn

pi = Prob[ai is accessed], i=1,2,...,n
qi = Prob[accessing an element X, ai < X < ai+1]
q0 = Prob[accessing an element X, X < a1]
qn = Prob[accessing an element X, an < X]

• Problem: Find for the the array a1..n a binary search tree of minimum cost.

• Cost Measure: average search time
When computing the average, take into account both successful and unsuccessful searches.

• Quantitatively, cost of a BST T is C(T) where
C(T)= SUMni=1 picost(ai) +SUMni=0 qicost(X, ai < X < ai+1)
(to simplify the notation, assume a0 = -infinity and an+1 = infinity)

cost(ai) = level(ai) + 1
cost(X, ai < X i+1) = level(Ei)
where

Ei = External node where X would be if it were in the tree

• Notation:
Tij = OBST(ai+1, ... , aj)
Cij = cost(Tij)
Cij = SUMjs=i+1 ps{level Tij(as) + 1} + SUMjs=i qslevel Tij(Es)

T0n is the final tree being sought
T00 is empty

Ti,i+1 is a single-node tree that has element ai+1

• Principle of Optimality:

• Let T0n be an OBST for the elements a1 < a2 < ... < an, and let L and R be its left subtree and right subtree. Suppose that the root of T0n is ak, for some k.

• Clearly, the numbers in the tree L are a1, a2, ... , ak-1
and the numbers in the tree R are ak+1, ak+2, ... , an.

• We need to show that L is an OBST for its elements (and also R is an OBST for its elements).

• It will be shown below that C(T0n)=C(L)+C(R)+ p1+p2+...+pn + q0+q1+q2+...+qn

• That is,
C(T0n)=C(L)+C(R)+W
where
W=p1+p2+...+pn + q0+q1+q2+...+qn

• W is independent of the structure of L and R.

• If L is not an optimal BST for its elements, then we can find a better tree L' for the same elements, where C(L') < C(L). Let T' be the tree with root ak, left subtree L' and right subtree R.

• We have
C(T') = C(L') + C(R) + W < C(L) + C(R) + W < C(T0n)

• That is, T' is better than T0n, contradicting the fact that T0n is an optimal BST. Therefore, L must be an optimal BST for its elements. The same proof applies to R.

• Thus, a subtree of an OBST must be an OBST. This proves the principle of optimality.

• Recurrence Relation for Cij:

1. Tij is rooted at ak, for some k, (i+1 <= k <= j), its left subtree is Ti,k-1 and it right subtree is Tkj.

2. Cij = SUMjs=i+1 ps{level Tij(as) + 1} + SUMjs=i qslevel Tij(Es)

3. Ci,k-1 = SUMk-1s=i+1 ps{level Ti,k-1(as) + 1} +
SUMk-1s=iqslevel Ti,k-1(Es)

4. Ckj = SUMjs=k+1 ps{level Tkj(as) + 1} +
SUMjs=k qslevel Tkj(Es)

5. SUMjs=i+1 ps{level Tij(as) + 1} =
SUMk-1s=i+1 ps{level Tij(as) + 1} +
pk{level Tij(ak + 1} +
SUMjs=k+1 ps{level Tij(as) + 1} =
SUMk-1s=i+1 ps{level Tij(as) + 1} +
pk +
SUMjs=k+1 ps{level Tij(as) + 1}

6. Noting that levelTij(as)= levelTi,k-1(as) + 1 for i <= k-1, and
levelTij(as)= levelTkj(as) + 1, for i >= k+1, we conclude from (5):

7. SUMjs=i+1 ps{level Tij(as) + 1} = SUMk-1s=i+1 ps{level Ti,k-1(as) + 1+1} + pk + SUMjs=k+1 ps{level Ti,k-1(as) + 1+1} =
SUMk-1s=i+1 ps{level Ti,k-1(as) + 1} + SUMjs=k+1 ps{level Ti,k-1(as) + 1} + SUMk-1s=i ps + pk + SUMjs=k+1 ps

SUMjs=i+1 ps{level Tij(as) + 1} = SUMk-1s=i+1 ps{level Ti,k-1(as) + 1} + SUMjs=k+1 ps{level Ti,k-1(as) + 1} + SUMjs=i+1 ps

Similar arithmetic will show that
SUMjs=i qslevel Tij(Es) = SUMk-1s=iqslevel Ti,k-1(Es) + SUMjs=k qslevel Tkj(Es) + SUMjs=i qs

8. Therefore,
Cij = SUMk-1s=i+1 ps{level Ti,k-1(as) + 1} +
SUMjs=k+1 ps{level Ti,k-1(as) + 1} +
SUMjs=i+1 ps +
SUMk-1s=iqslevel Ti,k-1(Es) +
SUMjs=k qslevel Tkj(Es +
SUMjs=i qs =
Ci,k-1 + Ckj + SUMjs=i+1 ps + SUMjs=i qs

9. Cij = Ci,k-1 + Ckj + Wij

Wij = SUMjs=i+1 ps + SUMjs=i qs

10. Since we don't know which k should be at the root, and since we want to minimize Cij, we should take the k that gives the minimum. Therefore,

11. Cij = mini+1 <= k <= j{Ci,k-1 + Ckj + Wij}
Wij = SUMjs=i+1 ps + SUMjs=i qs
Cii = 0, Wii = qi
Denote by rij the index k that gives the minimum to Cij.

• Algorithm:

```

This procedure computes the weights Wijs

Procedure Weight(Input:p[1:n], q[0:n]; Output: W[0:n,0:n])
begin
for i=1 to n do
W[i,i] = q(i);
endfor

for l=1 to n do
for i=0 to n-l do
k = i+l;
W[i,k]=W[i,k-1] + p[k] + q[k];
endfor
endfor
end

```

```

This procedure computes the Cijs and the
rijs

Procedure OBST(Input:p[1:n], q[0:n], W[0:n,0:n];
Output: C[0:n,0:n], r[0:n,0:n])
begin
for i=0 to n do
C[i,i] := 0;
endfor

for l=1 to n do
for i=0 to n-l do
j=i+l;
C[i,j] := infinity;
m := i+1;	--m keeps the index of the min
for k=i+1 to j do
if C[i,j] >= C[i,k-1] + C[k,j] then
C[i,j] := C[i,k-1] + C[k,j];
m := k;
endif
endfor
C[i,j] := C[i,j] + W[i,j];
r[i,j] := m;
endfor
endfor
end

```

```

This procedure creates the tree Tij

Procedure create-tree(Input: r[0:n,0:n], a[1:n], i, j;
Output: T)
begin
if (i==j) then T=null; return; endif
T := new(node);	-- the root of Tij
k := r[i,j];
T --> data := a[k];
if (j==i+1) return; endif
create-tree(r[0:n,0:n], a[1:n], i, k-1; T --> left);
create-tree(r[0:n,0:n], a[1:n], k, j; T --> right);
end

```

```

This procedure is the master program that
creates the whole tree T0n

Procedure Final-tree(Input: a[1:n],p[1:n],q[1:n];
Output: T)
begin
Weight(p[1:n], q[0:n], W[0:n,0:n]);
OBST(p[1:n], q[0:n], W[0:n,0:n],
C[0:n,0:n], r[0:n,0:n]);
create-tree(r[0:n,0:n], a[1:n], 0, n, T);
end

```

• Example:
• a1 < a2 < a3 < a4

p1=1/10, p2=2/10, p3=3/10, p4=1/10

q0=0, q1=1/10, q2=1/20, q3=1/20, q4=1/10

 W00=0 W11=1/10 W22=1/20 W33=1/20 W44=1/10 W01=2/10 W12=3.5/10 W23=4/10 W34=2.5/10 W02=4.5/10 W13=7/10 W24=6/10 W03=8/10 W14=9/10 W04=10/10

 C00=0 C11=0 C22=0 C33=0 C44=0 C01=2/10 r01=1 C12=3.5/10 r12=2 C23=4/10 r23=3 C34=2.5/10 r34=4 C02=6.5/10 r02=2 C13=10.5/10 r13=3 C24=8.5/10 r24=3 C03=14/10 r03=2 C14=15/10 r14=3 C04=19/10 r04=3

• The tree T04 has as root a3 because r04=3

The left subtree is then T02 and the right subtree is T34

T34 is a single-node tree having a4 because r34=4

T02 has as root a2 because r02=2

The left subtree of T02 is T01 and its right subtree is T22 (which is empty)

T01 is a single-node tree having a1 because r01=1

this completes the tree (to be drawn in class).