JPEG AND MPEG STANDARDS
Abdou Youssef

Motivation for Standards

Image/Video Compression Standards (Outline)

The JPEG ``Toolkit''

The Baseline JPEG Algorithm

The Quantization Matrices

Coding of the DC Residuals

Coding of the AC Terms (The AC Huffman Table)

Examples

Example Huffman Table for Lena

Coding the Example Block

Decoding

Extended JPEG

Performance of JPEG

MPEG (1 & 2): Basic Concepts

Modes of MPEG Compression

Types of Frames in MPEG

Interframe Compression of P and B Blocks

Flowchart of MPEG Compression

MotionEstimation and Prediction

MotionEstimation and Prediction for P Frames

MotionEstimation and Prediction for B Frames

MPEG2

References

Links to other Standards
Back to Top

Motivation for Standards
 Why Standards?
 Compatibility
 Production cost reduction
 Triggering growth and product development
 Do JPEG/MPEG standards kill research?
 Not necessarily
 Those standards are very flexible regarding the encoder design,
leaving much room for improvement
 The trends are toward even greater flexibility in future
generations of the standards
 Another growth area is the development of systems
which integrate components that use the standards
Back to Top

Image/Video Compression Standards (Outline)
 JPEG
 Baseline JPEG
 Extended JPEG
 Lossless JPEG (DPCM + Huffman/Arithmetic)
 MPEG
 History (H.261 and H.263)
 MPEG1
 MPEG2
 Why not MPEG3
 MPEG4
Back to Top

The JPEG ``Toolkit''
 JPEG provides a ``toolkit'' of techniques for compressing continuoustone,
still, color and monochrome images
 Baseline JPEG provides a DCTbased algorithm, and uses
runlength encoding and Huffman coding
 Baseline JPEG operates only in sequential mode, and is restricted to
8 bits/pixel input images
 Extended JPEG offers several optional enhancements:
 12bit/pixel input
 Progressive transmission
 Choice between Arithmetic and Huffman coding
 Adaptive quantization
 Tiling
 Still picture interchange file format (SPIFF)
 Selective refinement
 Applications of JPEG
 Desktop publishing, color fax, photojournalism,
medical images, general image archiving systems, consumer imaging,
graphic arts, and others
Back to Top

The Baseline JPEG Algorithm
 It operates on 8×8 blocks of the input image
 Meannormalization (subtract 128 from each pixel)
 Transform: DCTtransform each block
 Quantization
 An 8×8 quantization matrix Q is userprovided
 Each block is divided by Q (point by point)
 The terms are then rounded to their nearest integers
 Remark: Up to 4 quantization matrices per image are
allowed (for example, one for luminance, and for each
of the three color components)
 Entropycoding of the DC coefficients (the top left coefficient of each quantized block) using DPCM+Huffman
 Huffmanencode the DC residuals derived from
the difference between each DC and the DC of the preceding
block
 Entropycoding of the AC (i.e., nonDC) coefficients
 Zigzagorder the quantized coefficients of each block
 Record for each nonzero coefficient both its distance (called
run) to the preceding nonzero coefficient in the zigzag sequence,
and its value (called level)
 Huffman code the [run,level] terms using one single
Huffman table for all the AC's of the image
Back to Top

The Quantization Matrices
 They are userprovided
 They can be computed using the contrast sensitivity
function of the HVS
 Their values are 8bit integers
 They provide control over the bitrate by scaling them by a constant
factor
 Example of Q:
16  11  10  16  24  40  51  61

12  12  14  19  26  58  60  55

14  13  16  24  40  57  69  56

14  17  22  29  51  87  80  62

18  22  37  56  68  109  103  77

24  35  55  64  81  104  113  92

49  64  78  87  103  121  120  101

72  92  95  98  112  100  103  99

Back to Top

Coding of the DC Residuals
 The DC residuals are in the range [2047,2047]
 Thus, the magnitude of each residual is between 0 and 2047=2^{11}1,
inclusive.
 Divide this range into 12 subranges, or categories, where category k
ranges from 2^{k1} to 2^{k}1 inclusive. (Note that category 0
has only the integer 0).
 Let r be a DC residual. Clearly,
r = 2^{k1} + t, where 0 <= t <= 2^{k1}1. In particular,
t can be represented in binary using k1 bits.
 Therefore, r can be uniquely represented by s, k, and t.
 Develop a Huffman code for the 12 categories, where every codeword
is at most 16 bits long
 Encode each DC residual r as a binary string hsm where
 h is the codeword of the residual's category k
 s= sign of the residual; s=0 if negative, 1 if positive
 m= the (k1)bit binary representation of t.
Back to Top

Coding of the AC Terms (The AC Huffman Table)
 All nonzero AC terms are of magnitude <= 2^{10}1
 Let x be an AC term, and let d be the length of the zero run between x and the
previous nonzero AC term.
 1 <= x <= 2^{10}1
 Divide the range 1 .. 2^{10}1 into 10 categories, where category k
is the range (2^{k1} .. 2^{k}1) inclusive.
 Represent x by its sign s and its magnitude x. s = 0 if x<0, 1 otherwise.
 For whatever value of x, there is a unique k such that 2^{k1} <= x <=
2^{k}1. k is the category of x

x = 2^{k1} + t, where 0 <= t <= 2^{k1}1. In particular,
t can be represented in binary using k1 bits.
 Therefore, x can be uniquely represented by s, k, and t.
 Thus, (d,x) can be represented by (d,k,s,t), where s is one bit and t is k1 bits.
 The runlength d is between 0 and 63.
 d = 15p + r, where r=0,1,2,...,14.
 r can be represented with 4 bits
r_{3}r_{2}r_{1}r_{0}, different from 1111.
 p can be represented with 11110000_{1} 11110000_{2} ... 1111000_{p}
 This implies that
d is 11110000_{1} 11110000_{2} ...
1111000_{p}r_{3}r_{2}r_{1}r_{0}
 the category (or level) k, being between 1 and 10, can be represented with
4 bits k_{3}k_{2}k_{1}k_{0}.
 Therefore, the (d,k) in the (d,k,s,t) representation of (d,x), is represented as
11110000_{1} 11110000_{2} ...
1111000_{p}r_{3}r_{2}r_{1}r_{0}k_{3}k_{2}k_{1}k_{0}
 This representation of (d,k) can be viewed as a sequence of p+1 bytes.
 The last byte represents 16*10= 160 legitimate values
 Add to those the byte 11110000 and the endofblock (EOB) symbol to signal
the end of the nonzero AC terms in a block.
 This results in 162 different symbols.
 Build a Huffman table for those 162 symbols,
where every codeword is at most 16 bits long
 JPEG encodes each quantized AC term (d,x)=(d,k,s,t) as hsm where
 h is the Huffman codeword of (d,k)
 s= sign of the term; s=0 if negative, 1 if positive
 m= the (k1)bit binary representation of t.
Back to Top

Examples
 Take an 8×8 block of Lena B:
143  147  149  152  156  147  146  149

151  146  143  154  148  144  153  132

147  143  145  149  144  145  128  133

152  145  145  144  146  134  130  137

146  143  142  147  124  127  139  138

139  145  139  127  126  135  139  141

145  137  124  130  138  136  140  144

144  124  136  134  137  139  142  145

 Normalize B. It becomes NB=B128:
15  19  21  24  28  19  18  21

23  18  15  26  20  16  25  4

19  15  17  21  16  17  0  5

24  17  17  16  18  6  2  9

18  15  14  19  4  1  11  10

11  17  11  1  2  7  11  13

17  9  4  2  10  8  12  16

16  4  8  6  9  11  14  17

 Perform DCT, resulting in a block D:
103.4  12.4  6.0  2.1  8.1  5.7  0.7  0.4

31.7  11.6  22.8  0.7  0.1  0.5  4.7  1.8

11.0  21.2  3.7  8.8  2.3  0.1  0.2  5.6

0.2  4.9  10.3  8.6  8.9  2.5  7.6  1.4

4.9  0.4  1.9  8.9  6.6  2.6  3.6  3.6

0.7  0.9  0.9  4.1  7.2  15.5  2.8  1.8

3.1  1.0  2.5  5.5  5.2  6.7  10.7  1.1

2.9  0.2  1.2  2.7  3.6  2.6  0.4  6.4

 Quantize (round(D./Q)), resulting in Dq:
6  1  0  0  0  0  0  0

4  1  2  0  0  0  0  0

1  2  0  0  0  0  0  0

0  0  0  0  0  0  0  0

0  0  0  0  0  0  0  0

0  0  0  0  0  0  0  0

0  0  0  0  0  0  0  0

0  0  0  0  0  0  0  0

 zigzag ordering: 6 1 4 1 1 0 0 2 2 allzeros
Back to Top

Example Huffman Table for Lena
Zero Run  Cetgory  Codelength  Codeword

0  1  2  00

0  2  2  01

0  3  3  100

0  4  4  1011

0  5  5  11010

0  6  6  111000

0  7  7  1111000

.  .  .  .

1  1  4  1100

1  2  6  111001

1  3  7  1111001

1  4  9  111110110

.  .  .  .

2  1  5  11011

2  2  8  11111000

.  .  .  .

3  1  6  111010

3  2  9  111110111

.  .  .  .

4  1  6  111011

5  1  7  1111010

6  1  7  1111011

7  1  8  11111001

8  1  8  11111010

9  1  9  111111000

10  1  9  111111001

11  1  9  111111010

.  .  .  .

.  .  .  .

End of Block (EOB)  4  1010

Back to Top

Coding the Example Block
 zigzag ordering: 6 1 4 1 1 0 0 2 2 allzeros
 6 is coded separately along with other DC terms.
 the AC terms are codes as follows:
 the AC terms as a sequence of (d,x) pairs:
(0,1) (0,4) (0,1) (0,1) (2,2) (0,2) EOB
 As a sequnce of (d,k,s,t), where t is in decimal:
(0,1,1,0) (0,3,1,0) (0,1,1,0) (0,1,1,0) (2,2,0,0) (0,2,0,0) EOB
 As a sequnce of (d,k,s,t), where t is in binary using k1 bits:
(0,1,1,) (0,3,1,00) (0,1,1,) (0,1,1,) (2,2,0,0) (0,2,0,0) EOB
 The final code:
00 1 100 1 00 00 1 00 1 11111000 0 0 01 0 0 1010
 number of bits to code the AC terms: 33 bits
 Bitrate per symbol (out of the 63 AC symbols): 33/36 = 0.52 bits/symbol
 Compression ration based on those ACs alone: (63*8/33)= 15
Back to Top

Decoding
 Entropydecode the bitstream back to the quantized blocks
 Dequantize: multiply each block coefficient by the corresponding coefficient
of the quantization matrix
 Apply the inverse DCT transform on each block
 Denormalize: add 128 to each coefficient
 Example: Reconstructed block B'
143  147  153  157  157  154  149  145

145  147  151  154  153  149  144  141

146  147  149  149  146  142  137  134

146  146  145  142  139  135  132  130

145  143  140  136  133  131  130  130

141  138  134  131  130  131  134  135

136  134  130  128  129  133  139  142

133  131  127  126  129  135  142  147

 Error block
0  0  4  5  1  7  3  4

6  1  8  0  5  5  9  9

1  4  4  0  2  3  9  1

6  1  0  2  7  1  2  7

1  0  2  11  9  4  9  8

2  7  5  4  4  4  5  6

9  3  6  2  9  3  1  2

11  7  9  8  8  4  0  2

 MSE=5.2
Back to Top

Extended JPEG
 Extended JPEG allows for several optional enhancements:
 12bit/pixel input
 Arithmetic coding is allowed as an alternative
to Huffman coding
 Adaptive quantization: Allows 5bit scale change to
the quantization matrix from one block to another
 SPIFF: A file format that provides for the interchange of
compressed image files between different application
environments
 Progressive transmission (PT)
 Sequential PT
 Spectral selection
 Successive approximations
 Hierarchical PT (pyramid encoding)
 Tiling
 Selective refinement
Back to Top

Performance of JPEG
 Original Lena
 JPEGcompressed Lena at a compression ratio of 12:1
 JPEGcompressed Lena at a compression ratio of 20:1
 JPEGcompressed Lena at a compression ratio of 32:1
Back to Top

MPEG (1 & 2): Basic Concepts
 Video is a sequence of images called frames
 Color
 Three components: red (R), green (G), and blue (B)
 For compatibility with noncolored media, the RGB model
was converted to an equivalent model  YC_{b}C_{r}
 Y is the luminance component, which was
experimentally determined to be
Y=0.299R+0.587G+0.114B
 C_{b}=BY
 C_{r}=RY
 Y is referred to as luma, and
C_{b} & C_{r} as chroma
 Every frame is really 3 images: one Y, one C_{b} and one C_{r}
 8 bits/pixel for each of the three color components
 Because human vision is less sensitive to color,
the C_{b} and C_{r} images are downsampled by 2
in each dimension (they are quarter the size of Y images)
 Much of MPEG processing is on the basis of a macroblock:
A 16×16 luminance block with the two 8×8
associated chroma blocks
Back to Top

Modes of MPEG Compression
 Intraframe compression
 Exploits spatial redundancy only
 Operates on single frames independetly of other frames
 Interframe compression
 Exploits both spatial and temporal redunadancies
 Employs motion estimation (MS) without standardizing any
MS algorithm
 Derives motioncompensated predictions of frames
 Finally, it performs JPEGlike compression
on the residual frames
Back to Top

Types of Frames in MPEG
 MPEG has 3 types of frames: I, P, and B
 I frames are stricly intra compressed as in JPEG. Their purpose
is to provide random access points to the video
 P frames are motioncompensated forwardpredictivecoded frames; they are interframe compressed, and typically provide more
compression than I frames
 B frames are motioncompensated
bidirectionallypredictivecoded
frames; they are interframe compressed, and typically provide
the most compression
 The relative numbers of I, P and B frames are arbitrary
 An I frame must occur at least once every 132 frames to provide
useracceptible speed of random access to various parts of a video
Back to Top

Interframe Compression of P and B Blocks
 A motioncompensated prediction (i.e., approximation) f'
of a P/B frame f is made
 The residual ff' is then compressed in a JPEGlike style
 Any macroblock of the original frame f may be strictly intra
compressed if its prediction is deemed to be poor
 Issues to be addressed
 Method of strict intra compression
 Method of compressing residual macroblocks
 Motioncompensated prediction
Back to Top

Flowchart of MPEG Compression
Back to Top

MotionEstimation and Prediction
 Motion estimation is performed on the basis of macroblocks, using
the 16×16 luminance blocks only
 Motion is assumed to be uniform across all the pixels of a macroblock
 Remark: There is a tradeoff in deciding the size of the basic block for
motion estimation
 The block has to be sufficiently large to avoid ``false hits''
 The block has to be sufficiently small to avoid
diverse motions within one single block
 The MPEG block size, 16×16, is a good compromise
Back to Top

MotionEstimation and Prediction for P Frames
 Consider a Pframe P
 P will be predicted (i.e., approximated) from one single reference
frame R
 R is the most recent (decoded) I or P frame
 For each macroblock MB of P, find the closest matching macroblock MB'
in the reference frame R
 If the MBtoMB' match is satisfactory, then
 treat MB' as the prediction (i.e., approximation) of MB
 record the motion (i.e., displacement) vector between the two
macroblocks (allowing halfpixel accuracy)
 Compute and compress the macroblock residual MBMB'
(luma and chroma)
 If the MBtoMB' match is found to be unsatisfactory, then
the macroblock MB is strictly intra compressed as is done in I frames
 The motion vectors of all the macroblocks of P exhibit redundancy
due to similar (or sometimes identical) motion experienced by many
neighboring macroblocks
 This redundancy is exploited by coding the consecutive differential
values of motion vectors (i.e., DPCM)
 Remark 1: MPEG does not standardize the decision mechanism for judging
whether or or not a match between two macroblocks is satisfactory
 Remark 2: A typical decision mechanism involves computing an error measure
between the luminance of the two macro blocks.
The match is treated as satisfory
if and only if the error is below a certain threshold.
Possible error measures include meansquare error (MSE),
mean absolutedifference error (MAD),
and variance(MBMB')/variance(MB).
Back to Top

MotionEstimation and Prediction for B Frames
 Consider a Bframe B
 B will be predicted (i.e., approximated) from TWO reference
frames R_{1} and R_{2}
 R_{1} is the most recent (decoded) past I/P frame, and R_{2}
is the nearest (decoded) future I/P frame
 For each macroblock MB of B, find the closest matching macroblock MB_{1}
in the reference frame R_{1}, and the closest matching macroblock
MB_{2} in R_{2}
 The predicted macroblock is PM=NINT(alpha_{1}MB_{1}+alpha_{2}MB_{2})
 alpha_{1}=0.5 and alpha_{2}=0.5 if both matches are satisfactory
 alpha_{1}=1 and alpha_{2}=0 if only the 1st match is satisfactory
 alpha_{1}=0 and alpha_{2}=1 if only the 2nd match is satisfactory
 alpha_{1}=0 and alpha_{2}=0 if neither match is satisfactory
 Compute and compress the macroblock residual MBPM (luma and chroma)
 If neither match is found to be satisfactory, then
the macroblock MB is strictly intra compressed as is done in I frames
 Record the motion vector(s) between the MB and the other one or two
macroblocks (allowing halfpixel accuracy)
 Again, the motionvector redundancy is exploited by coding the
consecutive differential values of motion vectors (i.e., DPCM)
 Remark: The prediction mode chosen is 2bit coded and passed on
along with the macroblock header information
Back to Top

MPEG2
 Higher data rates than MPEG1
 MPEG2 allows for higher quality source images
 4:2:0 (chroma subsamples only horizontally)
 4:4:4 (no subsampling of chroma)
 Note: 4:2:2 is what's supported by MPEG1
 MPEG2 allows for finer quantization and for specifying
separate quantization table for luma and chroma
 MPEG2 allows for finer adjustment of quantization scale factor,
used in intra compression
 MPEG2 allows for interlaced video
 MPEG2 supports error concealment (of lost macroblocks)
 MPEG2 supports scalable compression
 SNRscalability: by sending bands of DCT coefficients
 spatialscalability: pixel resolution by down/upsampling
 temporalscalability: different frame rates by skipping frames
 MPEG2 has a Profile and Level structure
 Profiles are algorithmic elements included in MPEG2
 Levels are upper bounds on parameter values
 Profiles (profiles are backwardcompatible)
 Simple: No use of B frames
 Main: what was described earlier, but no scalability
 SNRscalable
 Spatially scalable
 High: temporally scalable, higherquality source data (4:2:0,
4:2:2 and possibly 4:4:4 in the future)
 Levels
 Low: 352 × 240 frame size
 Main: 720 × 480 frame size
 High 1440: 1440 × 1152 frame size
 Very High: 1920 × 1080 frame size
Back to Top

References
 B. pennebaker and J. L. Mitchell,
JPEG Still Image Data Compression Standard,
Van Nostrand reinhold, New York 1993.
 ISO111722: Generic Coding of moving pictures and associated audio (MPEG1)
 ISO138182: Generic Coding of moving pictures and associated audio (MPEG2)
Back to Top

Links to other Standards
Back to Top