Divide and Conquer
I. Template for Divide and Conquer
II. First Application: Mergesort
III. Second Application: Quicksort
IV. First Application: Order Statistics
I. Template for Divide and Conquer
divide&conquer(input I)
begin
if (size of input is small enough) then
solve directly;
return;
endif
divide I into two or more parts I1, I2,...;
call divide&conquer(I1) to get a subsolution S1;
call divide&conquer(I2) to get a subsolution S2;
...
Merge the subsolutions S1, S2,...into a
global solution S;
end
Back to Top
II. First Application: Mergesort
Procedure Mergesort(input A[1:n], i,j; output B[1:n]) sorts A[i:j] and puts the
result in B[i:j])
Procedure Mergesort (input A[1:n], i,j; output B[1:n])
begin
Datatype C[1:n];
If i=j then B[i] = A[i]; Return; endif
Mergesort (A,i,(i+j)/2;C); /* sorts the first half*/
Mergesort(A,(i+j)/2 +1,j;C); /* sorts the second half*/
Merge(C,i,j;B); /* merges the two sorted halves *
* into a single sorted list */
end
Procedure Merge(input: C i,j; output: B)
begin
int k=(i+j)/2;
int u,v,w; /* u will scan C[i:k],
v will scan C[k+1:j], and
w will index the out B*/
u=i;
v=k+1;
w=u;
while (u <= k and v <= j) do
if C[u] <= C[v] then
B[w]=C[u]; u++;w++;
else
B[w]=C[v]; v++;w++;
endif
endwhile
If u > k then
Put C[v:j] in B[w:j];
Elseif v>j
Put C[u:k] in B[w:j];
endif
end
Back to Top
III. Second Application: Quicksort
- take any arbitrary element of the input array A[1:n]. Assume we take the leftmost element, A[1];
- partition A[1:n] around A[1] into two parts: the left part consisting of elements <= A[1], and the right part
consisting of elements \> A[1]
- sort the left part and the right part recursively;
- now the array is sorted without the need for any merging;
Partition (in/out A[p:q]): partitions the array
A[p:q] and returns the position where the partition
element A[p] ends up
function Partition(in/out A[p:q])
begin
int i,j;
real a=A[p];
i=p; j=q;
while (i <= j) do
while (A[i] <= a && i<=q) do
i++;
endwhile
while (A[j] > a) do
j--;
endwhile
if i < j then
swap (A[i],A[j]);
i++;
j--;
endif
endwhile
swap(A[p],A[j]);
return(j);
end
- Time Complexity Analysis of Partition:
- The array A is scanned from the left and from the right (by i and j) until i and j meet (or cross by one position)
- thus, A is scanned wholly once. Each element takes constant time to process (one comparison).
- therefore, Partition(A[1:n]) takes O(n) time (or cn time)
- code for Quicksort:
Quicksort(input/output A[1,n]; input: i,j) sorts the array A[i:j]
Procedure Quicksort(in/out A[1,n];in: p,q)
begin
int r;
if (p==q) return; // if one element to sort, there is nothing to do.
r := partition(A[p:q]);
Quicksort(A[p:r-1]);
Quicksort(A[r+1:q);
end
- Time Complexity Analysis of Quicksort(A[1:n]):
- T(n)=T(r-1)+T(n-r) +cn
- worst case is where r=1 (it happens when A is originally sorted)
- in that case, T(n)=T(n-1)+cn, yielding that T(n)=O(n2)
- This makes Quicksort theoretically slow
- However, in practice, it is the fastest sorting algorithm
- something is NOT right. What is it?
- Average-Case Time Complexity Analysis of Quicksort(A[1:n]):
- Let Ta(n) denote the average time of Quicksort(A[1:n])
- the position r of the ending position of the partition element can be 1,2,..., or n
- therefore,
| T(n) = | T(0)+T(n-1)+cn | or
|
| T(n) = | T(1)+T(n-2)+cn | or
|
| T(n) = | T(2)+T(n-3)+cn | or
|
| T(n) = | T(3)+T(n-4)+cn | or
|
| ...
|
| T(n) = | T(n-1)+T(0)+cn |
|
- The average value of T(n) is then the sum of all the possible values divided by n
- Ta(n) =(2/n)(T(1)+T(2)+...+T(n-1)) + cn
- We can safely assume that each recursive call takes average time. Therefore:
Ta(n) =(2/n)(Ta(1)+Ta(2)+...+Ta(n-1)) + cn
- We have n*Ta(n)=2(Ta(1)+Ta(2)+...+Ta(n-1)) + cn*n
- substitute n-1 for n across the board we get
- (n-1)*Ta(n-1)=2(Ta(1)+Ta(2)+...+Ta(n-2)) + c(n-1)*(n-1)
- n*Ta(n) - (n-1)*Ta(n-1) = 2Ta(n-1) + c(2n-1)
- n*Ta(n) = (n+1)*Ta(n-1) + c(2n-1) < (n+1)*Ta(n-1) + 2cn = (n+1)*Ta(n-1) + c'n
- Thus, n*Ta(n) < (n+1)*Ta(n-1) + c'n
- Divide by n(n+1), we obtain:
- Ta(n)/(n+1) < Ta(n-1)/n + c'/(n+1)
- Define X(n) = Ta(n)/(n+1). Then we have:
- X(n) < X(n-1) +c'/(n+1)
X(n) | < | X(n-1) | + | c'/(n+1)
|
X(n-1) | < | X(n-2) | + | c'/(n)
|
X(n-2) | < | X(n-3) | + | c'/(n-1)
|
| ...
|
X(1) | < | X(0) | + | c'/(2)
|
- Add and cancel, we get: X(n) < c'(1/2+1/3+...+1/n+1) < c' Ln(n)=c'Ln(2) log(n) = c"log(n)
- Therefore
Ta(n)=(n+1)*X(n) < c"(n+1)log(n)
Ta(n) = O(nlog n)
Back to Top
IV. Third Application: The Order Statistics Problem
- Input: a real array A[1:n], and an integer k (1 <= k <= n)
- Problem: Compute the k-th smallest element of A
- A brute-force method sorts A and then returns the k-th element of the sorted array.
- This takes O(n log n).
- This method is an overkill. Is there a faster method?
- A divide and conquer method:
function select(A[1:n],k)
begin
if n=1 then
return(A[1]);
endif
r := partition(A[1:n],1,n);
case
k=r: return(A[r]);
k < r: return(select(A[1:r-1],k));
k > r: return(select(A[r+1,n],k-r));
endcase
end
- Time complexity of select: T(n)=max(1,T(r-1),T(n-r)) + cn;
- Worst case: r=1 (or r=n). In that case, T(n)=T(n-1)+cn, yielding
T(n)=O(n2).
- This is worse than O(n log n) of the sorting-based method that we set out to beat.
- The reason for this problem is that there is no control on the size of the two sides of the partition output.
- That is, the uneducated choice of the partitioning element (A[1]) does not prevent the terrible umbalance in the partition output.
- To remedy the problem one needs to choose the partition element more carefully.
- The new select method will be called Quickselect.
- Method for a wise choice of the partitioning element:
Function wisepartitionelt(A[1:n])
begin
int m=n/5; /* integer division */
Divide the array into groups of five each:
A[1:5], A[6:10],...;
Sort each group; assume now that A[1:5]
is sorted, A[6:10] is sorted, ...;
Let B[1:m] be the array of the middles of
the sorted groups: A[3],A[8],A[13],...;
/* next, find the median of B, that is,
the m/2-th smallest element of B*/
v := Quickselect(B[1:m],m/2);
return(v);
end
- Quickselect( ) will be like Select( ), except for the wise choice of the partitioning element;
function Quickselect(A[1:n],k)
begin
if n=1 then
return(A[1]);
endif
v := wisepartitionelt(A[1:n]);
find v in the array A, and swap
it with A[1];
/* now A[1] is the wise value v*/
r := partition(A[1:n],1,n);
case
k=r: return(A[r]);
k < r: return(Quickselect(A[1:r-1],k));
k > r: return(Quickselect(A[r+1,n],k-r));
endcase
end
- Time complexity T(n) of Quickselect:
- Time of wisepartitionelt is O(n)+ T(n/5)
- Hence, T(n) = max(T(r-1),T(n-r))+T(n/5) + cn
- Theorem: n/4 < r < 3n/4 and n/4 < n-r < 3n/4
- Proof: Lay out the sorted groups (of 5) as columns, and put the columns in an order that makes
Their middle elements look sorted.
x | x | ... | x | x | ... | x
|
x | x | ... | x | x | ... | x
|
x | x | ... | v | x | ... | x
|
x | x | ... | x | x | ... | x
|
x | x | ... | x | x | ... | x
|
- The "red" elements are all <= x, and so they fall in the part of size r-1 elements that are <= v.
Since there are 3*(n/5)/2=3n/10 red elements, we conclude that
3n/10 <= r. Because n/4 < 3n/10, we get n/4 < r.
- the "blue" element are all > x, and so they fall in the part of size n-r elements that are > v.
Since there are 3*(n/5)/2=3n/10 blue elements, we conclude that
3n/10 <= n-r. Because n/4 < 3n/10, we get n/4 < n-r. This implies that r<3n/4.
- In concludion, we have n/4 < r < 3n/4. This in turn implies that
n/4 < n-r < 3n/4. Q.E.D
- From the theorem we conclude that
T(r-1) <= T(3n/4) and T(n-r) <= T(3n/4).
- Therefore, the time formula for T(n) becomes:
T(n) <= T(3n/4) + T(n/5) +cn
- Theorem: T(n) <= 20cn
- Proof: By induction on n.
- Basis step: n=1. T(1)=c (because Quickselect just returns A[1]). Since c <= 20c*1, we have
T(1) <= 20c*1
- Induction step: Assume T(m) <= 20cm for all m <= n-1. Prove that T(n) <= 20cn.
The induction hypothesis applies to m=3n/4 and to m=n/5 because 3n/4 <= n-1 and n/5 <= n-1.
Hence, T(3n/4) <= 20c(3n/4)=15cn
And T(n/5) <= 20c(n/5)=4cn.
As a result, we have
T(n) <= T(3n/4) + T(n/5) +cn <= 15cn + 4cn +cn = 20cn
That is, T(n) <= 20cn. Q.E.D.
- It follows from the last theorem that T(n) = O(n).
Back to Top