Introduction



Much of the information is in form of images
Images are handled by machines as a matrix
of digital picture elements, or pixels
The appearance of an image depends on

image type

resolution
Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005
1
Types of images & Resolution

bilevel (black & white)

e.g. faxes

grayscale
color

dot per inches (dpi)




600 x 600 – actual medium quality laser printer
1200 x 1200 – low cost phototypesetter
4800 x 4800 – high resolution phototypesetter
Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005
2
Bilevel images: CCITT fax
standard




fax: facsimile
CCITT Comité Consultatif International Téléphonique et
Télégraphique, it is part of the ITU International
Telecommunication Union, one of the specialized
agencies of the United Nations
In the late 70s CCITT starts thinking about a
standard for fax transmission
1980 CCITT Group 3 standard

group 1 & 2 are earlier attempt, which use simpler
encoding and modulations techniques, resulting in
very slow transmissions
Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005
3
CCITT Group 3 - I



It is the most common standard for fax
transmission
It is accepted worldwide, almost every fax
machine supports this standard
It uses compression algorithms for bilevel
images
Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005
4
CCITT Group 3 - II






Paper size: international A4 (not US letter)
standard resolution 204x98 dpi (200x100)
high resolution 204x196 dpi (200x200)
bilevel image  1 bit/pixel
image size: 1728x1188 bits at
standard resolution  about 2 Mbit
Transmission rate: 4.8 Kbit/s
1728 bits/line
1188
lines/page
today is usually higher, 14.4 – 33.6 Kbit/s
At 4.8Kbit/s in std resolution one page would
take about 430 sec, but only 1 minute on
average with Group 3 algorithms


5
Run-length coding



Each scan line is composed by
sequences of pixel of the same color
Count the number of element of each
run
Example 3w 4b 9w 2b 2w 6b 5w 2b 5w...
Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005
6
G3 1D


Group 3 One-Dimensional coding (G3 1D) is
called Modified Huffman (MH) as it encodes run
lengths using a
predefined Huffman
code
In order to maintain
black/white
syncronization, each line
begins with a white run,
eventually of zero length
Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005
7
G3 1D
1000 011 10100 11 0111 0010 ...

predefined Huffman codewords
have been found from the
probabilities of the runs in
typical handwritten documents
Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005
8
G3 1D


As one line has 1728 bits, we have to define a
codeword for all 1728 black and white run
lengths
As shorter runs occur more frequently that
longer runs, we code each run length in an
additive form




there is a terminating and makeup codeword
Lengths form 0 to 63 are coded with a single terminating
codeword
Longer runs are coded with one or more makeup
codewords and a terminating codeword
Each line is terminated with a EOL symbol composed of
eleven 0 and one 1
Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005
9
G3 2D



Group 3 Two-Dimensional coding (G3 2D) is
called Modified READ (MR) as it is a variant of a
previously defined code, called READ (Relative
Element Address Designate)
Many images have a high degree of vertical
coherence between consecutive lines
changing elements are coded w.r.t. a “nearby” change
position of the same color in the previous (reference) line
10
G3 2D



Nearby means within an interval of radius 3
pixels
If there are changing elements in the current
line without correspondents in the reference
line  switch to horizontal mode (1D)
On the opposite if the ref line has a run with
no counterpart in the current line  special
pass code
Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005
11
G3 2D
reference line
current line
vertical mode horizontal pass vertical
mode
code mode
-1
0
from a Huffman table,
generated
with codewords for
code
-3, -2, -1, 0, +1, +2, +3
...
+2 -2
<mode | length of preceding
white run | length of black run>
0001
12
G3 2D

Two dimensional coding is more prone to
transmission errors





In the G3 1D an error may cause problems in the
entire line, but syncronization is forced back by EOL
codeword
Here an error in the reference line is likely
propagated in all the other lines
For this reason there are 1 reference line for each k
lines (i.e. k-1 are coded w.r.t. each ref line)
standard resolution  k=2
high resolution  k=4
Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005
13
CCITT fax standard
compression performances

Standard resolution (~200x100 dpi)



High resolution (~200x200 dpi)



G3 1D
 0.13 bits/pixel 57s. for A4 at 4.8 Kbps
G3 2D (k=2)  0.11 bits/pixel 47s. for A4 at 4.8 Kbps
G3 2D (k=4)  0.09 bits/pixel 74s. for A4 at 4.8 Kbps
Compression is very good for office image where
run lengths are long
It would be very bad for bilevel natural images
14
Continuous-tone images:
why lossless compression?


lossy compression is often preferred to have
remarkably more compressed images, with
good quality
However there are some situations in which
using an approximation may not be adequate



medical images
historical documents
images with legal relevance
Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005
15
Continuous-tone images:
lossless compression



GIF standard
PNG standard
JPEG-LS



It is a quite new standard. The original JPEG
standard included a lossless mode, but its
performances were not close to ‘state of the
art’
extimation of pixel value using quite simple
context: effective and low cost solution
www.hpl.hp.com/loco
16
GIF image format - I




Adopted by CompuServe to minimize the time
required to download images over a modem
link
The most widely used lossless image format
until 1995
8-bit pixel description
256 color images, but it is possible to use a
color map
Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005
17
GIF image format - II

The color map can be specified for each image
or can be omitted



if specified, it is included as an header into image
file, in uncompressed form
color map is composed of 256 24-bit entries, that
specify 256 RGB colors
Compression scheme used is LZW

Alphabet symbols are the 256 colors of the color
map plus a “clear” code and an “end-of-information”
code
Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005
18
GIF image format - III




Even if this feature is not widely used, GIF files
may contain more than one image, and it is
possible to share the color map
LZW-coded information is grouped into blocks
preceded by a byte-count, in order to skip an
image without decompressing it
In 1995 Unisys announced that there would be
royalties on GIF implementations due to an old
patent they held on LZW
This catalyzed the development of a new lossless
image format, designed for public domain and
with the last improvements
19
PNG image format - I




Portable Network Graphics (pronounced “ping”)
it uses gzip compression scheme
through some improvements compression obtained
is about 10-30% better than GIF
By default it encodes the pixels in raster scan order,
but some other methods are available



it is possible to code horizontal difference, i.e. the
difference between current pixel value and the previous
one or vertical difference, i.e. the difference w.r.t. the
above pixel
average difference, the difference with the average of
above and next pixel
...
Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005
20
PNG image format - II



It is possible to use more than 256 colors, up
to 16 bit grayscale and 48 bit color
GIF uses one special pixel value to indicate
transparency, PNG uses 256 different values
per pixel, allowing for picture progressively
fading into the background
It seems inevitable that PNG format will
gradually assume the role of standard lossless
image format for the WWW, replacing GIF
Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005
21
Continuous-tone images:
why lossy compression?



Digital images are yet an approximation of the
real analog phenomenon
lossy techniques allow to obtain very good
compression with a modest lost of details
This is useful for storing and trasmitting
images
Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005
22
Continuous-tone images:
lossy compression


JPEG
JPEG2000



a new image coding system that uses stateof-the-art compression techniques based on
wavelet technology
file extension .jp2
With very compressed files, if image size is
the same, perceived quality of JPEG2000
images is better w.r.t. JPEG images
Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005
23
JPEG format - I





JPEG is a standard defined by the Joint
Photographic Experts Group in 1992
It was conceived to transmit images at 64
Kbps
It has a lossy mode and a lossless mode (not
so much used, and today replaced by the
JPEG-LS standard)
With lossy mode it allows to obtain very good
quality at about 1 bit/pixel
Implementation complexity is reasonable
Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005
24
JPEG format - II



It could be used with graylevel and color
images
Each channel of the color space (RGB, YUV...)
is treated separately
it allows progressive transmission (that is
much better suited for WWW than raster
transmission)
Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005
25
Raster vs. progressive
transmission
JPEG Coder - I
Discrete
Cosine
Transform
Quantization
Binary
Encoder
Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005
27
JPEG Coder - II





Image is divided in 8x8-pixel squares
Preprocessing
Apply Discrete Cosine Transform on
each square
Coefficient quantization
Bit stream encoding
Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005
28
Preprocessing: color space
transformation & downsampling

from RGB into YUV


The Y component represents the brightness
of a pixel, and the U and V components
together represent the hue and saturation
Human eye can see more detail in the Y
component than in the U and V, that can be
compressed more aggressively



4:4:4
4:2:2
4:2:0
no downsampling
horizontal downsampling of a factor 2
both horizontal and vertical downsampling
Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005
29
Discrete Cosine Transform - I


The discrete cosine transform (DCT)
is a Fourier-related transform similar to
the discrete Fourier transform (DFT),
but using only real numbers
It is used in JPEG because it is fast and
quite easy to implement efficiently
Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005
30
Discrete Cosine Transform - II
where





the block is N1  N 2 pixels (in JPEG, 8x8)
A(i,j) is the value of pixel of position (i,j)
B(k1 ,k 2 ) is the DCT coefficient of position (k1 ,k 2 )
low values for k1 corresponds to low vertical
frequencies, low values for k 2 to low horizontal
frequencies
Generally higher frequencies have very low values
Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005
31
Discrete Cosine Transform - III


DCT function basis
each 8x8 square is
reduced to 64
coefficient
Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005
32
Discrete Cosine Transform - IV


Knowing with infinite precision the 64 DCT
coefficient it is possible to reconstruct exactly
the pixels of the square
But



finite precision
quantization of the coefficients (always)
Some coefficient related to high frequency are not
transmitted. This allows higher compression without
sacrifying too much quality as human eye is less
responsible
Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005
33
Quantization - I



The DCT matrix obtained is scaled differently in each
component, dividing each by a diferent factor
the factor for each component has been decided based
on human sensitivity to changes at each frequency
In practice the matrix of factor is usually
34
Quantization - II


Next, all values are rounded to nearest integer
This leads to a quite high number of 0s in the
high frequency zone, as factors are bigger
Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005
35
Zig-zag scan


Low frequency
coefficients are
transmitted before
higher frequency
coefficients
This allows for
progressive
visualization of this
8x8 block
Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005
36
Raster vs. progressive
transmission

Raster transmission


DCT coefficient of the upper left block, then
those of all the others in the upper part of
the image and so on
Progressive transmission

first all (0,0) coefficients, than all (0,1) and
so on, following zig-zag scan in each block
Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005
37
Binary coding

DCT(0,0) has usually a very slow variation
from one block to the next, as it is the mean
value


Tipically the bit stream is coded with Huffman


For this reason it is convenient to encode the
difference from the previous value
It is possible to use arithmetic scheme, gaining some
compression at cost of decoding speed
Huffman codes are predefined, or it is possible
to build optimal tables and insert them in the
stream
Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005
38
JPEG Decoder
Binary
Decoder
Some values are lost!
Dequantization
Inverse DCT
Good quality, but
reconstruction is not
exact
Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005
39
JPEG performances - I
Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005
40
JPEG performances - II
Original
Quality factor 75
Quality factor 20
Quality factor 3
41