TRANSFORMS in LOSSY COMPRESSION

Abdou Youssef

Motivation for Transforms
Desirable Transforms
Different Perspectives of Transforms
Matrix Formulation of Transforms
Transform-Based Lossy Compression
The Matrix of the Fourier Transform
The Matrix of the Discrete Cosine Transform (DCT)
The Matrix of the Hadamard Transform
The Matrix of the Walsh Transform
The Matrix of the Haar Transform
Vector Space Perspective
Relationship between the Vector Basis and the Matrix Formulation of Transforms
Visualization of Basis Images of the Various Transforms
Frequency Perspective: The Fourier Transform
Connection with The Human Visual System
Connection to Compression
Treatment of Discrete Signals (Discrete Fourier Transform)
Why Use DCT rather than DFT (Boundary Problems of the Fourier Transforms)
Relation of DCT to FFT
Statistical Perspective
DCT vs. KL

1. Motivation for Transforms

Why transform the data
- To decorrelate the data so that fast scalar (rather than slow vector) quantization can be used
- To exploit better the characteristics of the human visual system (HVS) by separating the data into vision-sensitive parts and vision-insensitive parts
- To compact most of the ``energy'' in a few coefficients, so that to discard most of the coefficients and thus achieve compression

2. Desirable Transforms

Desirable Properties of transforms
- Data Decorrelation, exploitation of HVS, and energy compaction
- Data-Independence (same transform for all data)
- Speed
- Separability (for fast transform of multidimensional data)
Various transforms achieve those properties to various extents
- Fourier Transform
- Discrete Cosine Transform (DCT)
- Other Fourier-like transforms: Haar, Walsh, Hadamard
- Wavelet transforms
The Karhunen-Loeve Transform
- Optimal w.r.t. data decorrelation and energy compaction
- But it is data-dependent (the transform matrix is different for different matrices)
- And slow because the transform matrix has to be computed every time
- Therefore, KL is only of theoretical interest to data compression

3. Different Perspectives of Transforms

Statistical perspective
Frequency perspective
Vector space perspective
End-use perspective (matrix formulation)

4. Matrix Formulation of Transforms

Simply stated, a transform is a matrix multiplication of the input signal and the transform-matrix
Each of the standard transforms mentioned earlier is defined by an N×N square non-singular matrix A_N
Transform of a 1D discrete input signal (a column vector x of N components) is the computation of y=A_Nx
Transform of an N×M image X is transform of each column followed by transform of each row. In matrix form, transform of image X is the computation of Y=A_NXA_M^t
The inverse transform is simply x=A_N^-1y for 1D signals, and X=A_N^-1Y(A_M^-1)^t for images

5. Transform-Based Lossy Compression

Compression of an image X:
1. Transform X, yielding Y=A_NXA_M^t
2. Scalar-quantize Y, yielding Y'
3. Losslessly compress Y', yielding a bit stream B
Image reconstruction
1. Losslessly decompress B back to Y'
2. Dequantize Y', yielding an approximation Y" of Y
3. Inverse-transform Y" yielding a reconstructed image X'=A_N^-1Y"{A_M^-1}^t.
Except for the case of the KL transform, the characterizing matrix A_N of each of the standard transforms is independent of the input, that is, A_N is the same for all 1D signals and images.
In the following, the matrix A_N of each transform will be defined for arbitrary N, and then A₂, A₄ and A₈ will be shown

6. The Matrix of the Fourier Transform

The matrix A_N=(a_kl) for the Fourier Transform:
Remark: A_N^-1 = conjugate (A_N)
Illustrations:

Let

7. The Matrix of the Discrete Cosine Transform (DCT)

The matrix A_N=(a_kl) for DCT:
Remark: A_N^-1=A_N^t
Illustrations:

8. The Matrix of the Hadamard Transform

The matrix A_N=(a_kr) for the Hadamard Transform is defined as follows:
- Let k=k_n-1k_n-2...k₀ in binary
- Let r=r_n-1r_n-2...r₀ in binary
- a_kr = (-1)^{k_n-1.r_n-1+
  k_n-2.r_n-2 + ... +
  k₀.r₀}
Alternatively, the Hadamard matrix can be defined recursively:
Remark: A_N^-1=A_N^t=A_N
Illustrations:

9. The Matrix of the Walsh Transform

The matrix A_N=(a_kr) for the Walsh Transform is defined as follows:
- Let k=k_n-1k_n-2...k₀ in binary
- Let r=r_n-1r_n-2...r₀ in binary
- a_kr = (-1)^{k_n-1.r₀+
  k_n-2.r₁ + ... +
  k₀.r_n-1}
Remark: A_N^-1=A_N^t=A_N
Illustrations:

10. The Matrix of the Haar Transform

The matrix A_N=(a_kl) for the Haar Transform, where N=2ⁿ:
Illustrations:

11. Vector Space Perspective

Analog signals are treated as an infinite-dimensional functional vector space
Finite Discrete are signals treated as finite-dimensional vector spaces
In either case, the vector space has a basis {e_k | k=0,1,...}
A transform of a signal x is a linear decomposition of x along the basis {e_k}:
- x=y₀e₀ +y₁e₁ +y₂e₂ + ... , where the {y_k} are real/complex numbers
- Transform: x (y_k)_k
- (y_k)_k is a representation of x
Compression-related desirable properties of a vector-space basis
- Correspondence with the human visual system
- Specifically, only a very small number of basis vectors are relevant to (i.e., visible by) the HVS, while the majority of the basis vectors are invisible to the HVS
- Uncorrelated decomposition-coefficients (y_k)_k

12. Relationship between the Vector Basis and the Matrix Formulation of Transforms

Consider finite 1D discrete signals of N components
1. They form an N-dimensional vector space R^N, where every vector is a column vector
2. Any basis consists of N linearly independent column vectors e₀,e₁, ... ,e_N-1
3. For any signal x=(x₀ x₁ ... x_N-1)^t, x=y₀e₀+y₁e₁+...+y_N-1e_N-1
4. That is, (x₀ x₁ ... x_N-1)^t= (e₀ e₁ ... e_N-1) (y₀ y₁ ... y_N-1)^t
5. In Matrix form, let
  - B=(e₀ e₁ ... e_N-1), an N x N matrix
  - A=B^-1
  - y= (y₀ y₁ ... y_N-1)^t, an N-dimensional column vector
  - x=(x₀ x₁ ... x_N-1 )^t, an N-dimensional column vector
  the equation of item 4 above is expressed as:
  
  x=By
6. Equivalently, y=Ax, where the columns of A^-1 are the basis column vectors e₀, e₁, ... , e_N-1
Consider now N×M images
- They form an NM-dimensional vector space R^N×M
- Any basis consists of NM matrices E_kl of dimensions N×M
- Following the same analysis as above, a transform basis
  E_kr= (column k of A_N^-1).(column r of A_M^-1)^t = e_k.e_r^t for k=0,1,...,N-1 and l=0,1,...,M-1
- I= J_klE_kl over all k and l.
Remark: For analog signals, the vector space is infinite-dimensional, and its basis is the infinite set of sine and cosine waves, to be addressed later

13. Visualization of Basis Images of the Various Transforms

The real part of the basis vectors of FFT for N=8
The basis vectors of DCT for N=8
The basis vectors of Hadamard transform for N=8
The basis vectors of Haar transform for N=8
Back to Top
14. Frequency Perspective: The Fourier Transform
- Consider a function x(t) that is either
  - of Finite support [0,T] (that is, x(t) is 0 outside [0 t]), or
  - periodic of period T
- Assume x(t) to be square-integrable over [0,T]
- Fourier series of x(t) is:
- In (more elegant) complex form:
- Fourier Transform: x(t) (y_k)_k
- Under mild conditions that are satisfied by most real-world signals, we have
- Therefore, x(t) is representable by (y_k)_k
- y_k is referred to as the k-th frequency contents of x(t).
- Theorem: y_k 0 as |k|
- The representation is periodic of period T. Thus, even if x(t) is defined over [0,T] only, the Fourier series ``periodizes'' x(t)
Back to Top
15. Connection with The Human Visual System
- In , the frequencies are k/T for all integers k
- The higher |k|, the higher the frequency
- In , y_k is the k-th frequency content of x(t)
- Experiments have shown that
  - suppressing a y_k (along with y_-k) for any high frequency k causes HARDLY VISIBLE or NO VISIBLE change to x(t)
  - suppressing a y_k (along with y_-k) for some low frequency k causes VISIBLE changes to x(t)
- 1D Illustration: Notice the minor changes in shape of the plots as high frequencies are dropped, and the big change when the lowest frequency is dropped
- Another Illustration: Decreasing ability to resolve (detect) changes/differences/contrast at increasingly higher frequencies:
- A Third Illustration: Decreasing ability to resolve (detect) changes/differences/contrast at increasingly lower frequencies, keeping the "field of view" constant ([0 pi]):
- Thus, the HVS is sensitive to moderate-to-low frequency data but insensitive to high-frequency data or very low frequency data
Back to Top
16. Connection to Compression ("First Cut")
- Facts
  - y_k 0 as |k|
- is a good mathematical and visual approximation of x(t)
- The faster the decay of y_k, the smaller r can be
- Thus, is a very small approximate representation of x, leading to high compression
Back to Top
17. Treatment of Discrete Signals (Discrete Fourier Transform)
- Sample N values (x_l) of x(t) at N points: for l=0,1,...,N-1.
- Since (x_l) is discrete and finite, there is no need to keep an of y_k's; rather, y₀,y₁,...,y_N-1 are sufficient. That is, ,
- DFT: (x_l)_l (y_k)_k
- Put in matrix form: y=A_Nx, where
- Again, for large k, y_k can be suppressed or heavily quantized
Back to Top
18. Why Use DCT rather than DFT (Boundary Problems of the Fourier Transforms)
- Discontinuities at the boundaries cause large high-frequency contents
- Eliminating those frequency contents cause boundary artifacts (known as Gibbs phenomenon, ringing, echoing, etc.)
Back to Top
19. Relation of DCT to FFT
- Let (x_l) be an original signal, and (y_k) its DCT transform, l=0,1,...,N-1
- Shuffle x to become almost symmetric; that is, create a new signal (x'_l) by taking the even-indexed terms followed by the reverse of the odd-indexed terms:
  - x'_l=x_2l and x'_N-l-1=x_2l+1, for l=0,1,...,N/2-1
- y'=DFT(x');
- y_k=Real(x'_k), k=1,2,...,N-1
Back to Top
20. Statistical Perspective
- Decorrelation of data leads to energy compaction, that is, concentrating the visual contents into a few coefficients.
- Decorrelation of data minimizes the distortions caused by scalar quantization
- The Karhunen-Loeve (KL) transform decorrelates the signal data completely, and thus compacts the energy into the minimum number of coefficients
- The MSE_k between the original signal and the one reconstructed from the k most important coefficients of a transform is minimized if the transform is KL
- Drawback of KL: the transform matrix is data-dependent, that is, for each new signal X, the transform matrix A is different and depends on the statistical properties of X.
- KL is a costly ideal for decorrelation/energy compaction/minimization of MSE
- For other transforms, the more a transform decorrelates the data, the better the energy compaction and compression performance
- DCT does a good job in decorrelation:
Back to Top
21. DCT vs. KL
- For most natural signals, the KL basis and the DCT basis are almost identical
- Therefore, DCT is near optimal (in decorrelation, energy compaction, and rme distortion) because KL is optimal
- Unlike KL, DCT is not signal-dependent
- Hence the popularity of DCT
Back to Top

TRANSFORMS in LOSSY COMPRESSION

Abdou Youssef

Motivation for Transforms

Desirable Transforms

Different Perspectives of Transforms

Matrix Formulation of Transforms

Transform-Based Lossy Compression

The Matrix of the Fourier Transform

The Matrix of the Discrete Cosine Transform (DCT)

The Matrix of the Hadamard Transform

The Matrix of the Walsh Transform

The Matrix of the Haar Transform

Vector Space Perspective

Relationship between the Vector Basis and the Matrix Formulation of Transforms

Visualization of Basis Images of the Various Transforms

Frequency Perspective: The Fourier Transform

Connection with The Human Visual System

Connection to Compression

Treatment of Discrete Signals (Discrete Fourier Transform)

Why Use DCT rather than DFT (Boundary Problems of the Fourier Transforms)

Relation of DCT to FFT

Statistical Perspective

DCT vs. KL

7. The Matrix of the Discrete Cosine Transform (DCT)

9. The Matrix of the Walsh Transform