Introduction Much of the information is in form of images Images are handled by machines as a matrix of digital picture elements, or pixels The appearance of an image depends on image type resolution Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 1 Types of images & Resolution bilevel (black & white) e.g. faxes grayscale color dot per inches (dpi) 600 x 600 – actual medium quality laser printer 1200 x 1200 – low cost phototypesetter 4800 x 4800 – high resolution phototypesetter Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 2 Bilevel images: CCITT fax standard fax: facsimile CCITT Comité Consultatif International Téléphonique et Télégraphique, it is part of the ITU International Telecommunication Union, one of the specialized agencies of the United Nations In the late 70s CCITT starts thinking about a standard for fax transmission 1980 CCITT Group 3 standard group 1 & 2 are earlier attempt, which use simpler encoding and modulations techniques, resulting in very slow transmissions Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 3 CCITT Group 3 - I It is the most common standard for fax transmission It is accepted worldwide, almost every fax machine supports this standard It uses compression algorithms for bilevel images Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 4 CCITT Group 3 - II Paper size: international A4 (not US letter) standard resolution 204x98 dpi (200x100) high resolution 204x196 dpi (200x200) bilevel image 1 bit/pixel image size: 1728x1188 bits at standard resolution about 2 Mbit Transmission rate: 4.8 Kbit/s 1728 bits/line 1188 lines/page today is usually higher, 14.4 – 33.6 Kbit/s At 4.8Kbit/s in std resolution one page would take about 430 sec, but only 1 minute on average with Group 3 algorithms 5 Run-length coding Each scan line is composed by sequences of pixel of the same color Count the number of element of each run Example 3w 4b 9w 2b 2w 6b 5w 2b 5w... Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 6 G3 1D Group 3 One-Dimensional coding (G3 1D) is called Modified Huffman (MH) as it encodes run lengths using a predefined Huffman code In order to maintain black/white syncronization, each line begins with a white run, eventually of zero length Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 7 G3 1D 1000 011 10100 11 0111 0010 ... predefined Huffman codewords have been found from the probabilities of the runs in typical handwritten documents Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 8 G3 1D As one line has 1728 bits, we have to define a codeword for all 1728 black and white run lengths As shorter runs occur more frequently that longer runs, we code each run length in an additive form there is a terminating and makeup codeword Lengths form 0 to 63 are coded with a single terminating codeword Longer runs are coded with one or more makeup codewords and a terminating codeword Each line is terminated with a EOL symbol composed of eleven 0 and one 1 Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 9 G3 2D Group 3 Two-Dimensional coding (G3 2D) is called Modified READ (MR) as it is a variant of a previously defined code, called READ (Relative Element Address Designate) Many images have a high degree of vertical coherence between consecutive lines changing elements are coded w.r.t. a “nearby” change position of the same color in the previous (reference) line 10 G3 2D Nearby means within an interval of radius 3 pixels If there are changing elements in the current line without correspondents in the reference line switch to horizontal mode (1D) On the opposite if the ref line has a run with no counterpart in the current line special pass code Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 11 G3 2D reference line current line vertical mode horizontal pass vertical mode code mode -1 0 from a Huffman table, generated with codewords for code -3, -2, -1, 0, +1, +2, +3 ... +2 -2 <mode | length of preceding white run | length of black run> 0001 12 G3 2D Two dimensional coding is more prone to transmission errors In the G3 1D an error may cause problems in the entire line, but syncronization is forced back by EOL codeword Here an error in the reference line is likely propagated in all the other lines For this reason there are 1 reference line for each k lines (i.e. k-1 are coded w.r.t. each ref line) standard resolution k=2 high resolution k=4 Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 13 CCITT fax standard compression performances Standard resolution (~200x100 dpi) High resolution (~200x200 dpi) G3 1D 0.13 bits/pixel 57s. for A4 at 4.8 Kbps G3 2D (k=2) 0.11 bits/pixel 47s. for A4 at 4.8 Kbps G3 2D (k=4) 0.09 bits/pixel 74s. for A4 at 4.8 Kbps Compression is very good for office image where run lengths are long It would be very bad for bilevel natural images 14 Continuous-tone images: why lossless compression? lossy compression is often preferred to have remarkably more compressed images, with good quality However there are some situations in which using an approximation may not be adequate medical images historical documents images with legal relevance Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 15 Continuous-tone images: lossless compression GIF standard PNG standard JPEG-LS It is a quite new standard. The original JPEG standard included a lossless mode, but its performances were not close to ‘state of the art’ extimation of pixel value using quite simple context: effective and low cost solution www.hpl.hp.com/loco 16 GIF image format - I Adopted by CompuServe to minimize the time required to download images over a modem link The most widely used lossless image format until 1995 8-bit pixel description 256 color images, but it is possible to use a color map Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 17 GIF image format - II The color map can be specified for each image or can be omitted if specified, it is included as an header into image file, in uncompressed form color map is composed of 256 24-bit entries, that specify 256 RGB colors Compression scheme used is LZW Alphabet symbols are the 256 colors of the color map plus a “clear” code and an “end-of-information” code Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 18 GIF image format - III Even if this feature is not widely used, GIF files may contain more than one image, and it is possible to share the color map LZW-coded information is grouped into blocks preceded by a byte-count, in order to skip an image without decompressing it In 1995 Unisys announced that there would be royalties on GIF implementations due to an old patent they held on LZW This catalyzed the development of a new lossless image format, designed for public domain and with the last improvements 19 PNG image format - I Portable Network Graphics (pronounced “ping”) it uses gzip compression scheme through some improvements compression obtained is about 10-30% better than GIF By default it encodes the pixels in raster scan order, but some other methods are available it is possible to code horizontal difference, i.e. the difference between current pixel value and the previous one or vertical difference, i.e. the difference w.r.t. the above pixel average difference, the difference with the average of above and next pixel ... Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 20 PNG image format - II It is possible to use more than 256 colors, up to 16 bit grayscale and 48 bit color GIF uses one special pixel value to indicate transparency, PNG uses 256 different values per pixel, allowing for picture progressively fading into the background It seems inevitable that PNG format will gradually assume the role of standard lossless image format for the WWW, replacing GIF Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 21 Continuous-tone images: why lossy compression? Digital images are yet an approximation of the real analog phenomenon lossy techniques allow to obtain very good compression with a modest lost of details This is useful for storing and trasmitting images Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 22 Continuous-tone images: lossy compression JPEG JPEG2000 a new image coding system that uses stateof-the-art compression techniques based on wavelet technology file extension .jp2 With very compressed files, if image size is the same, perceived quality of JPEG2000 images is better w.r.t. JPEG images Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 23 JPEG format - I JPEG is a standard defined by the Joint Photographic Experts Group in 1992 It was conceived to transmit images at 64 Kbps It has a lossy mode and a lossless mode (not so much used, and today replaced by the JPEG-LS standard) With lossy mode it allows to obtain very good quality at about 1 bit/pixel Implementation complexity is reasonable Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 24 JPEG format - II It could be used with graylevel and color images Each channel of the color space (RGB, YUV...) is treated separately it allows progressive transmission (that is much better suited for WWW than raster transmission) Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 25 Raster vs. progressive transmission JPEG Coder - I Discrete Cosine Transform Quantization Binary Encoder Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 27 JPEG Coder - II Image is divided in 8x8-pixel squares Preprocessing Apply Discrete Cosine Transform on each square Coefficient quantization Bit stream encoding Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 28 Preprocessing: color space transformation & downsampling from RGB into YUV The Y component represents the brightness of a pixel, and the U and V components together represent the hue and saturation Human eye can see more detail in the Y component than in the U and V, that can be compressed more aggressively 4:4:4 4:2:2 4:2:0 no downsampling horizontal downsampling of a factor 2 both horizontal and vertical downsampling Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 29 Discrete Cosine Transform - I The discrete cosine transform (DCT) is a Fourier-related transform similar to the discrete Fourier transform (DFT), but using only real numbers It is used in JPEG because it is fast and quite easy to implement efficiently Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 30 Discrete Cosine Transform - II where the block is N1 N 2 pixels (in JPEG, 8x8) A(i,j) is the value of pixel of position (i,j) B(k1 ,k 2 ) is the DCT coefficient of position (k1 ,k 2 ) low values for k1 corresponds to low vertical frequencies, low values for k 2 to low horizontal frequencies Generally higher frequencies have very low values Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 31 Discrete Cosine Transform - III DCT function basis each 8x8 square is reduced to 64 coefficient Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 32 Discrete Cosine Transform - IV Knowing with infinite precision the 64 DCT coefficient it is possible to reconstruct exactly the pixels of the square But finite precision quantization of the coefficients (always) Some coefficient related to high frequency are not transmitted. This allows higher compression without sacrifying too much quality as human eye is less responsible Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 33 Quantization - I The DCT matrix obtained is scaled differently in each component, dividing each by a diferent factor the factor for each component has been decided based on human sensitivity to changes at each frequency In practice the matrix of factor is usually 34 Quantization - II Next, all values are rounded to nearest integer This leads to a quite high number of 0s in the high frequency zone, as factors are bigger Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 35 Zig-zag scan Low frequency coefficients are transmitted before higher frequency coefficients This allows for progressive visualization of this 8x8 block Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 36 Raster vs. progressive transmission Raster transmission DCT coefficient of the upper left block, then those of all the others in the upper part of the image and so on Progressive transmission first all (0,0) coefficients, than all (0,1) and so on, following zig-zag scan in each block Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 37 Binary coding DCT(0,0) has usually a very slow variation from one block to the next, as it is the mean value Tipically the bit stream is coded with Huffman For this reason it is convenient to encode the difference from the previous value It is possible to use arithmetic scheme, gaining some compression at cost of decoding speed Huffman codes are predefined, or it is possible to build optimal tables and insert them in the stream Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 38 JPEG Decoder Binary Decoder Some values are lost! Dequantization Inverse DCT Good quality, but reconstruction is not exact Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 39 JPEG performances - I Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 40 JPEG performances - II Original Quality factor 75 Quality factor 20 Quality factor 3 41