The Greedy Method
-
Greedy Template
-
First Application: Selection
Sort
-
Second Application: Optimal Merge
Patterns
-
Third Application: The Knapsack
Problem
-
Fourth Application: Minimum
Spanning Trees
-
Fifth Application: Single-Source
Shortest Paths Problem
I. Greedy Template:
Greedy(input I)
begin
while (solution is not complete) do
Select the best element x in the
remaining input I;
Put x next in the output;
Remove x from the remaining input;
endwhile
end
- The notion of "best" has to be defined in each problem separately.
- Therefore, the essense of each greedy algorithm is the selection policy
Back to Top
II. First Application: Selection Sort
- To sort using the greedy method, have the selection
policy select the
minimum of the remaining input. That is, best=minimum.
- The resulting algorithm is a well-known sorting algorithm,
called Selection Sort. It takes O(n^2) time, so it is not the best
sorting algorithm.
Back to Top
III. Second Application: Optimal Merge Patterns
- Input: N sorted arrays of length L[1], L[2],...,L[n]
- Problem: Ultimateley, to merge the arrays pairwise as
fast as possible. The problem is to determine which pair to merge
everytime.
- Method (the Greedy method): The selection policy (of which
best pair of arrays to merge next) is to choose the two shortest
remaining arrays.
- Implementation:
- Need a data structure to store the
lengths of the arrays, to find the shortest 2 arrays at
any time, to delete those lengths, and insert in a new length
(for the newly merged array).
- In essence, the data structure
has to support delete-min and insert. Clearly, a min-heap
is ideal.
- Time complexity of the algorithm:
The algorithm iterates (n-1) times. At every iteration
two delete-mins and one insert is performed. The 3
operations take O(log n) in each iteration.
- Thus the total time
is O(nlog n) for the while loop + O(n) for initial heap
construction.
-
That is, the total time is O(nlog n).
Back to Top
IV. Third Application: The Knapsack Problem
- Input: A weight capacity C, and n items of weights W[1:n] and
monetary value P[1:n].
- Problem: Determine which items to take and how much of each item
so that
- the total weight is <= C, and
- the total value (profit) is maximized.
- Formulation of the problem: Let x[i] be the fraction
taken from item i. 0 <= x[i] <= 1.
The weight of the part taken from item i is x[i]*W[i]
The Corresponding profit is x[i]*P[i]
The problem is then to find the values of the array x[1:n]
so that x[1]P[1]+x[2]P[2]+...+x[n]P[n] is maximized
subject to the constraint that
x[1]W[1]+x[2]W[2]+...+x[n]W[n] <= C
- Greedy selection policy: three natural possibilities
- Policy 1: Choose the lightest remaining item, and take
as much of it as can fit.
- Policy 2: Choose the most profitable remaining item,
and take as much of it as can fit.
- Policy 3: Choose the item with the highest price per
unit weight (P[i]/W[i]),
and take as much of it as can fit.
- Exercise: Prove by a counter example that Policy 1 does
not guarantee an optimal solution. Same with Policy 2.
- Policy 3 always gives an optimal solution.
- Example
i: 1 2 3 4
P: 5 9 4 8
W: 1 3 2 2
C= 4
P/W: 5 3 2 4
Solution:
1st: all of item 1, x[1]=1, x[1]*W[1]=1
2nd: all of item 4, x[4]=1, x[4]*W[4]=2
3rd: 1/3 of item 2, x[2]=1/3, x[2]*W[2]=1
Now the total weight is 4=C
(x[3]=0)
Back to Top
V. Fourth Application: Minimum Spanning Tree (MST)
- Definition of a spanning tree: A spanning tree of a graph
G of n nodes is a tree that has all the n nodes of
the graph such
that every edge of the tree is an edge in the graph.
- Definition: if the edges have weights, then the weight of a
tree is the sum of the weights of the edges of the tree.
- Statement of the MST problem:
- Input : a weighted connected graph G=(V,E). The weights are represented
by the 2D array (matrix) W[1:n,1:n], where W[i,j] is
the weight of edge (i,j).
- Problem: Find a minimum-weight spanning tree of G.
- The greedy method for this problem works on the basis
of this slection policy: choose the minimum-weight remaining
edge. If that edge does not create a cycle in the evolving tree,
add it to the tree.
- The greedy MST algorithm:
Procedure ComputeMST(in:G, W[1:n,1:n];out:T)
begin
Put in T the n nodes and no edges;
while T has less than n-1 edges do
Choose a remaining edge e of
minimum weight;
Delete e from the graph;
if (e does not create a cycle in T) then
Add e to T;
endif
endwhile
end ComputeMST
- the main implementation questions are:
- how to find and delete the min-weight edges
- How to tell if an edge creates a cycle in T
- For finding and deleting the min-weight edge, use
a minheap where its nodes are the labels+weights of
the graph edges.
- For cycle detection, note that
- T is a forest at any given time,
- adding an edge eliminates two trees from the forest and
replaces them by a new tree containg the union of the
nodes of the two old trees, and
- and edge e=(x,y) creates a cycle if both x and y
belong to the same tree in the forest.
- Therefore, a Union-Find data structure is perfect to tell
if an edge creates a cycle, and to keep track of what nodes
belong to each tree in the forest.
- A detailed implementation code of ComputeMST:
Procedure ComputeMST(input:G,W[1:n,1:n];output:T)
begin
integer PARENT[1:n];
initialize PARENT to -1 in each entry;
Build a minheap H[1:|E|] for all the |E| edges;
Put in T the n nodes and no edges;
while T has less than n-1 edges do
e=delete-min(H); /* assume e=(x,y)*/
r1 := F(x); r2 := F(y);
if (r1 != r2) then
Add e to T;
U(r1,r2);
endif
endwhile
end ComputeMST
- Time complexity of the MST algorithm:
- O(|E|) to build the heap
- up to |E| calls to U and F, taking
O(|E|log n) time
- up to |E| calls to delete-min, taking
O(|E|log |E|) time.
- therefore, the total time is O(|E|log |E|).
- Theorem: the ComputeMST algorithms computes a mininum spanning tree.
- Proof:
- Let T be the tree generated by the algorithm
- Let T' be a minimum spanning tree
- If T=T', done. So assume that T != T'
- Strategy: T' will be transformed to T without a change
of weight
- Let e be a min-eight edge in T-T'
- All edges in T that are < W(e) are thus in T' as well.
- Adding e to T' creates a cycle e1 ,e2 ,...ek
- This cycle must have an edge ej that is not in T because T has
no cycles. ej is in T'.
- Claim: W(ej) >= W(e).
We prove the claim by contradiction.
- Assume W(ej) < W(e)
- ComputeMST would process ej before e
- All the edges processed before e and are of weight < W(e),
and which are entered into T, are also in T' (by item 6)
- Therefore, when ej is processed, it would be found
not to create a cycle because ej and all the edges
that preceded it are all in T', and T' does not
have a cycle.
- Thus, ej would have to be added by the algorithm to T,
contradicting the fact that ej is not in T.
- Replace ej by e in T', resulting in a new tree T".
- W(T")=W(T')+W(e)-W(ej) <= W(T').
- Since T' is a minimum spanning tree, W(T") can't be < W(T').
- Therefore, W(T")=W(T').
- That is, T" is an MST and T" differs less from T that T' did.
- This kind of edge replacement operation, which make T' resemble
T more and more without a change of weight, can be repeated
a finite number of times until T' becomes identical to T.
- Thus, T and T' have the same weight, making T a minimum spanning
tree.
Q.E.D.
Back to Top
VI. Fifth Application: Single-Source Shortest Path problem
- Input: a weighted connected graph G=(V,E), and a node s designated as a source node. The weights are represented
by the 2D array (matrix) W[1:n,1:n], where W[i,j] is
the weight of edge (i,j). If (i,j) is not an edge, W[i,j]=infinity
Note: W[i.i]=0 for all i.
- Problem: Find the distance between s and every node in the graph.
- The greedy method here will take the definitions of some concept
before it can be formulated.
- Let Y be a set, initially containg the single source node s.
- Definition: A path from s to a node x outside Y is called special
if every intemediary node on the path belongs to Y.
- Let DIST[1:n] be a real array where
DIST[i]=the length of the shortest special path from s to i
- Greedy selection policy: choose from all the nodes still outside
Y the node of minimum DIST value, and add it to Y.
- The claim, which will be proved later, is that every node in Y has its
DIST value equal to the distance from it to s.
- the SSSP algorithm:
Procedure SSSP(in W[1:n,1:n], s;out DIST[1:n]);
begin
for i =1 to n do
DIST[i] := W[s,i];
endfor
/* implement Y is Boolean array Y[1:n] */
/* Y[i]= 1 if i belong to set Y, 0 otherwise */
Boolean Y[1:n]; /* initialized to 0*/
Y[s] := 1; /* add s to set Y */
for num =2 to n do
choose a node u from out of Y such that
DIST[u] = min{DIST[i] | Y[i] = 0};
Y[u] := 1; /* Add u to Y;*/
/*update the DIST values of the other nodes*/
for all node w where Y[w] = 0 do
DIST[w]= min(DIST[w],DIST[u]+W[u,w]);
endfor
endfor
end
- Time Complexity of the SSSP algorithm:
- the 1st for-loop clearly takes O(n) time
- Choosing u takes O(n) time, because it involves finding
a minimum in an array.
- The innermost for-loop for updating DIST has a contant-time
body, and iterates at most n times, thus takes O(n) time.
- Therefore, the for-loop iterating over num takes O(n*n) time
- Thus, SSSp takes O(n^2) time.
- Theorem: When a node u enters Y, we have
DIST[u] = distance(s,u).
- Proof:
- The proof is by induction on the number k of elements in Y.
- Basis: K=1. That is, Y has only node s.
Well DIST[s]=W[s,s]=0; also, distance(s,s)=0. Thus,
DIST[s]=distance(s,s).
- Induction: assume the theorem holds for every node v that
had entered Y before u. Prove that the theorem holds for u which
is selected by the algorithm to be the next node to enter Y.
We do so by contradiction.
- Assume that DIST[u] != distance(s,u). That is,
distance(s,u) < DIST[u].
- This means the shortest path
from s to u (call that path P) is not a special path.
- This implies
that at some point, P exits Y going through some intermediary
node(s) before reaching u.
- Let z be the first node that
P goes through right when P exits Y.
- Then, the portion of P from
s to z, which we'll call Q, is a special path from s to z.
- We now have
DIST[z] <= length(Q) <= length(P) = distance(s,u) < DIST[u]
That is, DIST[z] < DIST[u].
- This contradicts the fact that
the algorithm
choose the min-DIST node u to enter Y.
- Therefore, DIST[u] = distance(s,u).
Q.E.D.
Back to Top