Chapter 1. Digital image representation

Virtual image, a point or system of points, on one side of a mirror or lens, which, if it existed, would emit the system of rays which actually exists on the other side of the mirror or lens.

--Clerk Maxwell

Vector images

One way to describe an image using numbers is to declare its contents using position and size of geometric forms and shapes like lines, curves, rectangles and circles; such images are called vector images.

Coordinate system

We need a coordinate system to describe an image, the coordinate system used to place elements in relation to each other is called user space, since this is the coordinates the user uses to define elements and position them in relation to each other.

Figure 1.1. Coordinate system.

Coordinate system.

The coordinate system used for all examples in this document has the origin in the upper left, with the x axis extending to the right and y axis extending downwards.

Defining shapes

It would have been nice to make a smiling face, instead of the dissatisfied face on the left, by using a bezier curve, or the segment of a circle this could be achieved, this being a text focusing mainly on raster graphics though, that would probably be too complex.

A simple image of a face can be declared as follows:

Figure 1.2. Vector image

draw circle
     center        0.5, 0.5
     radius        0.4
     fill-color    yellow
     stroke-color  black
     stroke-width  0.05
draw circle
     center        0.35, 0.4
     radius        0.05
     fill-color    black
draw circle
     center        0.65, 0.4
     radius        0.05
     fill-color    black
draw line
     start         0.3, 0.6
     end           0.7, 0.6
     stroke-color  black
     stroke-width  0.1
         
Vector image
A vector image of a face, and the instructions used to create the image.

The preceding description of an image can be seen as a “cooking recipe” for how to draw the image, it contains geometrical primitives like lines, curves and cirles describing color as well as relative size, position and shape of elements. When preparing the image for display is has to be translated into a bitmap image, this process is called rasterization.

A vector image is resolution independent, this means that you can enlarge or shrink the image without affecting the output quality. Vector images are the preferred way to represent Fonts, Logos and many illustrations.

Bitmap images

Bitmap-, or raster [1] -, images are “digital photographs”, they are the most common form to represent natural images and other forms of graphics that are rich in detail. Bitmap images is how graphics is stored in the video memory of a computer. The term bitmap refers to how a given pattern of bits in a pixel maps to a specific color.

[Note]Note

In the other chapters of introduction to image molding, raster images is the only topic.

Figure 1.3.  Raster image

Raster image
A rasterized form of the letter 'a' magnified 16 times using pixel doubling

A bitmap images take the form of an array, where the value of each element, called a pixel picture element, correspond to the color of that portion of the image. Each horizontal line in the image is called a scan line.

The letter 'a' might be represented in a 12x14 matrix as depicted in Figure 3., the values in the matrix depict the brightness of the pixels (picture elements). Larger values correspond to brighter areas whilst lower values are darker.

Sampling

When measuring the value for a pixel, one takes the average color of an area around the location of the pixel. A simplistic model is sampling a square, this is called a box filter, a more physically accurate measurement is to calculate a weighted Gaussian average (giving the value exactly at the pixel coordinates a high weight, and lower weight to the area around it). When perceiving a bitmap image the human eye should blend the pixel values together, recreating an illusion of the continuous image it represents.

Image dimensions

The number of horizontal and vertical samples in the pixel grid is called Image dimensions, it is specified as width x height.

Resolution

Resolution is a measurement of sampling density, resolution of bitmap images give a relationship between pixel dimensions and physical dimensions. The most often used measurement is ppi, pixels per inch [2].

Figure 1.4. Sampling grid

Sampling grid
A rasterized form of the letter 'a' magnified 16 times, where each pixel is represented as a circle instead of a square.
Megapixels

Megapixels refer to the total number of pixels in the captured image, an easier metric is image dimensions which represent the number of horizontal and vertical samples in the sampling grid. An image with a 4:3 aspect ratio with dimension 2048x1536 pixels, contain a total of 2048x1535=3,145,728 pixels; approximately 3 million, thus it is a 3 megapixel image.

Table 1.1. Common image dimensions

DimensionsMegapixelsNameComment
640x4800.3VGAVGA
720x5760.4CCIR 601 DV PALDimensions used for PAL DV, and PAL DVDs
768x5760.4CCIR 601 PAL fullPAL with square sampling grid ratio
800x6000.4SVGA 
1024x7680.8XGAThe currently (2004) most common computer screen dimensions.
1280x9601.2  
1600x12002.1UXGA 
1920x10802.11080i HDTVinterlaced, high resolution digital TV format.
2048x15363.12KTypically used for digital effects in feature films.
3008x19605.3  
3088x20566.3  
4064x270411.1  
Scaling / Resampling

When we need to create an image with different dimensions from what we have we scale the image. A different name for scaling is resampling, when resampling algorithms try to reconstruct the original continous image and create a new sample grid.

Reducing image dimensions

The process of reducing the image dimensions is called decimation, this can be done by averaging the values of source pixels contributing to each output pixel.

Increasing image dimensions

When we increase the image size we actually want to create sample points between the original sample points in the original raster, this is done by interpolation the values in the sample grid, effectivly guessing the values of the unknown pixels[3].

Sample depth

The values of the pixels need to be stored in the computers memory, this means that in the end the data ultimately need to end up in a binary representation, the spatial continuity of the image is approximated by the spacing of the samples in the sample grid. The values we can represent for each pixel is determined by the sample format chosen.

Figure 1.5. Sample depth

Sample depth
The same image width varying sample depths, note that high frequency areas (detailed areas) have an OK look earlier than low frequency areas.
8bit

A common sample format is 8bit integers, 8bit integers can only represent 256 discrete values (2^8 = 256), thus brightness levels are quantized into these levels.

12bit

For high dynamic range images (images with detail both in shadows and highlights) 8bits 256 discrete values does not provide enough precision to store an accurate image. Some digital cameras operate with more than 8bit samples internally, higher end cameras (mostly SLRs) also provide RAW images that often are 12bit (2^12bit = 4096).

16bit

The PNG and TIF image formats supports 16bit samples, many image processing and manipulation programs perform their operations in 16bit when working on 8bit images to avoid quality loss in processing, the film industry in Hollywood often uses floating point values to represent images to preserve both contrast, and information in shadows and highlights.

Colors

The most common way to model color in Computer Graphics is the RGB color model, this corresponds to the way both CRT monitors and LCD screens/projectors reproduce color. Each pixel is represented by three values, the amount of red, green and blue. Thus an RGB color image will use three times as much memory as a gray-scle image of the same pixel dimensions.

Figure 1.6. RGB bands

RGB bands
Color image built up of bands of red, green and blue color (This is image illustrates how a laptop display is constructed, note that this image is preferred to be viewed on a computer screen, and not in print.)

One of the most common pixel formats used is 8bit rgb where the red, green and blue values are stored interleaved in memory. This memory layout is often referred to as chunky, storing the components in seperate buffers is called planar, and is not as common.

Palette / Indexed images

It was earlier common to store images in a palletized mode, this works similar to a paint by numbers strategy. We store just the number of the palette entry used for each pixel. And for each palette entry we store the amount of red, green and blue light.

Figure 1.7. Indexed image

Indexed image
On the left, image using just 16 colors, on the right the palette used for this image. The way an indexed/paletted image works is similar to how paint by numbers work.

Image compression

Bitmap images take up a lot of memory, image compression reduces the amount of memory needed to store an image. For instance a 2.1 megapixel, 8bit RGB image (1600x1200) occupies 1600x1200x3 bytes = 5760000 bytes = 5.5 megabytes, this is the uncompressed size of the image.

Compression ratio is the ratio between the compressed image and the uncompressed image, if the example image mentioned above was stored as a 512kb jpeg file the compression ratio would be 0.5mb : 5.5mb = 1:11.

Lossless Image Compression

When an image is losslessly compressed, repetition and predictability is used to represent all the information using less memory. The original image can be restored. One of the simplest lossless image compression methods is run-length encoding. Run-length encoding encodes consecutive similar values as one token in a data stream.

Figure 1.8. Run-length encoding

Run-length encoding
70,
5, 25,
5, 27,
4, 26,
4, 25,
6, 24,
6, 23,
3, 2, 3, 22,
3, 2, 3, 21,
3, 5, 2, 20,
3, 5, 2, 19,
3, 7, 2, 18,
3, 7, 2, 17,
14, 16,
14, 15,
3, 11, 2, 14,
3, 11, 2, 13,
3, 13, 2, 12,
3, 13, 2, 11,
3, 15, 2, 10,
3, 15, 2, 8,
6, 12, 6, 6,
6, 12, 6, 64
           

In Figure 1.8, “Run-length encoding” a black and white image of a house has been compressed with run length encoding, the bitmap is considered as one long string of black/or white pixels, the encoding is how many bytes of the same color occur after each other. We'll further reduce the amount of bytes taken up by these 72 numerical values by having a maximum span length of 15, and encoding longer spans by using multiple spans separated by zero length spans of the other color.

70,                     15,  0, 15,  0, 15,  0, 10,
 5, 25,                  5, 15,  0, 10,
 5, 27,                  6, 15,  0, 12,
 4, 26,                  4, 15,  0, 11,
 4, 25,                  4, 15,  0, 10,
 6, 24,                  6, 15,  0,  9,
 6, 23,                  6, 15,  0,  8,
 3,  2, 3, 22,           3,  2,  3, 15,  0,  7,
 3,  2, 3, 21,           3,  2,  3, 15,  0,  6,
 3,  5, 2, 20,           3,  5,  2, 15,  0,  5,
 3,  5, 2, 19,           3,  5,  2, 15,  0,  4,
 3,  7, 2, 18,           3,  7,  2, 15,  0,  3,
 3,  7, 2, 17,           3,  7,  2, 15,  0,  2
14, 16,                 14, 15,  0,  1
14, 15,                 14, 15,
 3, 11, 2, 14,           3, 11,  2, 14,
 3, 11, 2, 13,           3, 11,  2, 13,
 3, 13, 2, 12,           3, 13,  2, 12,
 3, 13, 2, 11,           3, 13,  2, 11,
 3, 15, 2, 10,           3, 15,  2, 10,
 3, 15, 2,  8,           3, 15,  2,  8,
 6, 12, 6,  6,           6, 12,  6,  6,
 6, 12, 6, 64            6, 12,  6, 15, 0, 15, 0, 15, 0, 15, 0, 4
  

The new encoding is 113 nibbles long, a nibble i 4bit and can represent the value 0--4, thus we need 57 bytes to store all our values, which is less than the 93 bytes we would have needed to store the image as a 1bit image, and much less than the 750 bytes needed if we used a byte for each pixel. Run length encoding algorithms used in file formats would probably use additional means to compress the RLE stream achieved here.

Lossy Image Compression

Lossy image compression takes advantage of the human eyes ability to hide imperfection and the fact that some types of information are more important than others. Changes in luminance are for instance seen as more significant by a human observer than change in hue.

JPEG is a file format implementing compression based on the Discrete Cosine Transform DCT, together with lossless algorithms this provides good compression ratios. The way JPEG works is best suited for images with continuous tonal ranges like photographs, logos, scanned text and other images with lot's of sharp contours / lines will get more compression artifacts than photographs.

Loss through Generations

Lossy compression algorithms should not be used as a working format, only final copies should be saved as jpeg since loss accumulates over generations.

Figure 1.9. JPEG generation loss

JPEG generation loss

An image specially constructed to show the deficiencies in the JPEG compression algorithm, saved, reopened and saved again 9 times.

JPEG is most suited for photographics content where the adverse effect of the compression algorithm is not so evident.

JPEG is not suited as an intermediate format, only use JPEG for final distribution where filesize actually matters.

File formats and applications

Many applications have their own internal file format, while other formats are more suited for interchange of data. Table ref# lists some of the most common image formats.

Table 1.2. Vector File Formats

ExtensionNameNotes
.aiAdobe Illustrator DocumentNative format of Adobe Illustrator (based on .eps)
.epsEncapsulated PostscriptIndustry standard for including vector graphics in print
.psPostScriptVector based printing language, used by many Laser printers, used as electronic paper for scientific purposes.
.pdfPortable Document FormatModernized version of ps, adopted by the general public as 'electronic print version'
.svgScalable Vector GraphicsXML based W3C standard, incorporating animation, gaining adoption.
.swfShockwave FlashBinary vector format, with animation and sound, supported by most major web browsers.

Table 1.3. Raster File Formats

ExtensionNameNotes
.gifGraphics Interchange Format8bit indexed bitmap format, is superceded by PNG on all accounts but animation
.jpgJoint Photographic Experts GroupLossy compression format well suited for photographic images
.pngPortable Network GraphicsLossless compression image, supporting 16bit sample depth, and Alpha channel
.tiff, .tifTagged Image File Format 
.psdPhotoshop DocumentNative format of Adobe Photoshop, allows layers and other structural elements
.raw, .rawRaw image fileDirect memory dump from a digital camera, contains the direct imprint from the imaging sensor, before bayer interpolation and other color corrections.
.xcfGimp Project FileGIMP's native image format.


[1] raster n: formation consisting of the set of horizontal lines that is used to form an image on a CRT

[2] The difference between ppi and dpi, is the difference between pixels and dots - pixels can represent multiple values, whilst a dot is a monochrome spot of ink or toner of a single colorant as produced by a printer. Printers use a process called half toning to create a monochrome pattern the simulates a range of intensity levels.

[3] When using the digital zoom of a camera, the camera is using interpolating to guess the values that are not present in the image. Capturing an image at the maximum analog zoom level, and doing the post processing of cropping and rescaling on the computer will give equal or better results.