Quantization

From LavryEngineering
Jump to navigation Jump to search

Overview

The term "quantization" is used in digital audio to describe the process of taking a virtually infinite number of values within a fixed range and assigning a fixed number of values to represent the information.

Basics

In order to realize a useful digital audio system, a decision must be made as to the minimum level of accuracy the process requires to satisfy the demand. Higher quality audio demands more accurate information. As Sample frequency and Wordlength increase, the amount of information required to encode the audio increases proportionally.

Sample Frequency (SF) basically determines the bandwidth, or in other words, the highest frequency which can be accurately encoded assuming the lower limit is “zero Hz.”

Wordlength determines the maximum Dynamic range which can be accurately encoded.

Because contemporary digital audio systems like PCM are based on fixed SF for encoding and decoding, the range of possible time values is finite (one sample per sample period).

By contrast, Analog audio as represented by a constantly changing voltage waveform effectively has an infinite number of values between the highest and lowest voltage. Some method must be employed to reduce the possible number of values to a fixed set, and this process is quantization. A device used to perform this encoding task is referred to as an Analog to digital converter.

Quantization in conversion

In linear PCM digital audio, the Wordlength determines into how many even-sized steps the amplitude of the signal is divided. Because digital audio is based on binary math, the number of possible steps or “states” is equal to 2 raised to the power of (X), where X = wordlength in bits. For example, if the incoming audio has a maximum level of +/- 1.28 volts, and 8 bit encoding is used, the total voltage range would be 2.56 volts and this would be divided into 256 equal sized steps of 0.01 volt (2 raised to the power of 8 = 256). Although this level of accuracy might be adequate for intelligibility of full level voice signals, the distortion of lower level signals would be very obvious.

By increasing the Wordlength to 16 bits, the same voltage range is divided into 65,536 steps. In combination with use of a SF high enough to allow recoding frequencies as high as 20kHz, high quality digital audio became possible. This the standard for “CD Quality” audio However, even with this level of accuracy, the distortion of low level audio due to quantization error makes the use of dither necessary to achieve audible results rivalling high quality analog circuitry.

Quantization error is proportional to level in linear PCM encoding, because the step size is constant. This causes the rounding error which results when the analog input voltage effectively falls “between” voltage steps of the AD converter. The converter must either “round” the resulting output representing the input voltage up to the next higher step or down to the next lower step. In reality, virtually every sample contains some rounding error due to the nature of quantization; but the important factor is the size of the error in proportion to the desired analog signal.

Near full scale digital level, the rounding error is a very small percentage of the input signal; but at very low signal levels the percentage becomes significant. This error can effectively be randomized by the proper application of dither and or noise shaping to make it less audible. What makes this workable is the tendency for the human perception of audio to be most sensitive at higher acoustic levels and less sensitive as level decreases.

With the advent of 24 bit digital audio, quantization error could be reduced to a vanishingly small percentage of the analog signal. The theoretical dynamic range possible with 24 bit PCM is 144 dB, which exceed the dynamic range of the vast majority of analog circuitry.

Digital Audio Technology

In order to make cost effective digital audio equipment, it is necessary to use existing computer technology. This dictates use of digital words commonly used in computers for digital audio.

The earliest computers operated on eight bit words, making high quality digital audio impossible. As technology advanced, 16 bit computers became available and the CD standard of 16 bit linear PCM made high quality digital audio a reality for consumers as well as professionals. Although it was state-of-the-art at the time, it was clear that 16 bits was not enough to equal high quality analog audio electronics performance. It was easy to calculate that 24 bit encoding was needed to achieve this level of accuracy. The theoretical limit of 24 bit encoding is 144dB, which exceeds the dynamic range of even the finest real-world analog audio circuitry. Realities of factors such as noise in components like resistors or semiconductors make achieving 22-23 bit performance in conversion extremely challenging. Virtually all “24 bit AD converters” output some form of shaped noise for the two lsb’s. In the vast majority of cases, this is well below the level of the noise of the analog source.

Although computer technology is based on 8 bit bytes, as computers evolved it made sense to make the next generation of hardware 32 bit by “doubling” 16 bit operations when moving data between processes. There were other advantages to 32 bit architecture having to do with the number of possible addresses need for large scale memory like hard drives.

As a result, in most cases 24 bit audio is encoded with 32 bit words that have the last 8 lsb’s set to a value of zero. All of the original accuracy is retained in the process, and the 32 bit words “fit” into computer architecture seamlessly. The evolution of computer technology continued with 64 bit technology becoming common-place.

Any time a digitized signal is processed, including for basic level adjustment, a larger Wordlength is required to retain the accuracy equal to that of the original 24 bit signal. For this reason, most digital audio processing is performed at 32 to 64 bit precision. To retain the original accuracy, the Wordlength must exceed 24 bits until all processing has been performed, and the Wordlength can be reduced with dithering to reduce quantization error of low level signals.