VECTOR QUANTIZATION

Abdou Youssef

Background and Motivation
The Main Technique of VQ
VQ Issues
Sizes of Codebooks and Codevectors
Codevector Size Optimization
Construction of Codebooks (The Linde-Buzo-Gray Algorithm)
Initial Codebook
Codebook Structure (m-ary Trees)
Advanced VQ

1. Background and Motivation

Scalar quantization is insensitive to inter-pixel correlations
Scalar quantization not only fails to exploit correlations, it also destroys them, thus hurting the image quality
Therefore, quantizing correlated data requires alternative quantization techniques that exploit and largely preserve correlations
Vector quantization (VQ) is such a technique
VQ is a generalization of scalar quantization: It quantizes vectors (contiguous blocks) of pixels rather than individual pixels
VQ can be used as a standalone compression technique operating directly on the original data (images or sounds)
VQ can also be used as the quantization stage of a general lossy compression scheme, especially where the transform stage does not decorrelate completely, such as in certain applications of wavelet transforms

2. The Main Technique of VQ

Build a dictionary or ``visual alphabet'', called codebook, of codevectors. Each codevector is a (1D or 2D) block of n pixels
Coding
1. Partition the input into blocks (vectors) of n pixels
2. For each vector u, search the codebook for the best matching codevector û, and code u by the index of û in the codebook
3. Losslessly compress the indices
Decoding (A simple table lookup)
1. Losslessly decompress the indices
2. Replace each index i by codevector i of the codebook
The codebook has to be stored/transmitted
The codebook can be generated on an image-per-image basis or a class-per-class basis

3. VQ Issues

Codebook size (# of codevectors) N_c
Codevector size n
Codebook construction: what codevectors to include?
Codebook structure: for faster best-match searches
Global or local codebooks: class- or image-oriented VQ?

4. Sizes of Codebooks and Codevectors (Tradeoffs)

A large codebook size N_c allows for representing more features, leading to better reconstruction quality
But a large N_c causes more storage and/or transmission
A small N_c has the opposite effects
Typical values for N_c: 2⁷, 2⁸, 2⁹, 2¹⁰, 2¹¹
A larger codevector size n exploits inter-pixel correlations better
But n should not be larger than the extent of spatial correlations

5. Codevector Size Optimization

Optimal codevector size n for minimum bitrate and constant N_c
- Consider N × N images with r bits per pixel, and let n=p × p be the block size
- Size S of the compressed image:
- The bitrate
- For minimum bitrate R, the derivative
- Since , we have
Concrete figures of optimal n for N=512

N_c 2⁶ 2⁷ 2⁸ 2⁹ 2¹⁰ 2¹¹
p 7.4 6.5 5.6 4.9 4.2 3.6
Closest Power of 2 Value to p 8 8 8 4 4 4
Therefore, optimal 2D codevector sizes are 4 × 4 and 8 × 8 for powers-of-2 sizes
Interestingly, statistical studies on natural images have shown that there is little correlation between pixels more than 8 positions apart, and in fact, most of the correlations ar among pixels that are within 4 positions away.
Therefore, 4×4 and 8×8 codewords are excellent choices from both the bitrate standpoint and the correlation-exploitation standpoint

N_c	2⁶	2⁷	2⁸	2⁹	2¹⁰	2¹¹
p	7.4	6.5	5.6	4.9	4.2	3.6
Closest Power of 2 Value to p	8	8	8	4	4	4

6. Construction of Codebooks (The Linde-Buzo-Gray Algorithm)

Main idea
1. Start with an initial codebook of N_c vectors;
2. Form N_c classes from a set of training vectors: put each training vector v in Class i if the i-th initial codeword is the closest match to v;
  
  Note: The training set is the set of all the blocks of the image being compressed. For global codebooks, the training set is the set of all the blocks of a representative subset of images selected from the class of images of the application.
3. Repeatedly restructure the classes by computing the new centroids of the recent classes, and then putting each training v vector in the class of v's closest new centroid;
4. Stop when the total distortions (differences between the training vectors and their centroids) ceases to change much;
5. Take the most recent centroids as the codebook.
The algorithm in detail
1. Start with a set of training vectors and an initial code book Û₁⁽¹⁾, Û₂⁽¹⁾, ..., Û_{N_c}⁽¹⁾. Initialize the iteration index k to 1 and the initial distortion D⁽⁰⁾ to infinity.
2. For each training vector v, find the closest Û_i^(l): d(v,Û_i^(k))=min{d(v,Û_k^(k)) | k=1,2,...,N_c} where d(v,w) is the (Eucledian or MSE) distance between the vectors v and w.
3. Compute the new total distortion D^(k): D^(k) = d(v,Û_i^(k)) for all i=1,2,...,N_c and for all v in class i.
4. If |(D^(k-1)-D^(k))/D^(k-1)| < TOLERANCE, the convergence is reached; stop and take the most recent Û₁^(k), Û₂^(k),..., Û_{N_c}^(k) to be the codebook. Otherwise, go to step 5.
5. Compute the new class centroids (vector means): $Û_i^(k) :=

7. Initial Codebook

Three methods for constructing an initial codebook
- The random method
- Pairwise Nearest Neighbor Clustering
- Splitting
The random method:
- Choose randomly N_c vectors from the training set
Pairwise Nearest Neighbor Clustering:
1. Form each training vector into a cluster
2. Repeat the following until the number of clusters becomes N_c: merge the 2 clusters whose centroids are the closest to one another, and recompute their new centroid
3. Take the centroids of the N_c clusters as the initial codebook
Splitting
1. Compute the centroid X₁ of the training set
2. Perturb X₁ to get X₂, (e.g., X₂=.99*X₁)
3. Apply LBG on the current initial codebook to get an optimum codebook
4. Perturb each codevector to double the size of the codebook
5. Repeat step 3 and 4 until the number of codevectors reaches N_c
6. In the end, the N_c codevectors are the whole desired codebook

8. Codebook Structure (m-ary Trees)

Tree design and construction
1. Start with codebook as the leaves
2. Repeat until you construct the root
  - cluster all the nodes of the current level into m-node clusters
  - create a parent node for each cluster of m nodes
Searching for a best match of a vector v in the tree
- Search down the tree, always following the branch that incurs the least MSE
- The search time is logarithmic (rather than linear) in the codebook size
Refined Trees
- Tapered trees: The number of children per node increases as one moves down the tree
- Pruned Tress: Eliminate the codevectors that contribute little to distortion reduction

9. Advanced VQ

Prediction/Residual VQ
- Predict vectors
- compute the residual vectors,
- VQ-Code the residual vectors
Mean/Residual VQ (M/R VQ)
- Compute the mean of each vector and subtract it from the vector
- VQ-code the residual vectors
- Code the means using DPCM and scalar quantization
- Remark: Once the means are subtracted from the vectors, many vectors become very similar, thus requiring fewer codevectors to represent them
Interpolation/residual VQ (I/R VQ)
- subsample the image by choosing every l-th pixel
- code the subsampled image using scalar quantization
- Upsample the image using bilinear interpolation
- VQ-code the residual (original-upsampled) image
- remark: residuals have fewer variations, leading to smaller codebooks
Gain/Shape VQ (G/S VQ)
- Normalize all vectors to have unit gain (unit variance)
- Code the gains using scalar quantization
- VQ-code the normalized vectors