Sample frequency conversion

From LavryEngineering
Jump to navigation Jump to search

Overview

The term "Sample frequency conversion" and its abbreviation "SFC" are used interchangeably with the term "Sample rate conversion" its abbreviation "SRC." These terms are used to describe a process which employs DSP to re-sample digital audio at its original sample frequency and output it at a different sample frequency. SFC can either be a real-time process or an "off-line" process performed on digital audio files.

History

The need for high quality sample frequency conversion in digital audio came about in part due to the lack of a single standard for the sample frequency use by early digital audio recording systems. Prior to SONY/Phillips announcement of the CD standard of 44.1 kHz, each manufacturer used a different sample rate for their pioneering systems. Due to the need for standardization; professional audio manufacturers settled on the earliest standards of 44.1 and 48 kHz. The 44.1 kHz standard was chosen for obvious reasons; if the main form of distribution for consumers was to be the CD; systems had to be available to generate the needed digital audio recordings.

But professional audio manufacturers were also concerned about the limitations of the "barely adequate" choice of 44.1 kHz for a number of reasons; including the severe restrictions it placed on the design of filters necessary both in the AD converter inputs and DA converter outputs. A SF of 44.1 kHz implies a Nyquist frequency of 22.05kHz and thus required filters with a very "steep cut-off" to be employed to allow inputs up to 20kHz to pass at full level and provide sufficient attenuation of 90-100dB at 22.05 kHz. The practical of "vari-speeding" the recording to allow musicians to sing or play in a more comfortable range than the track was to be played would have introduced problems with a sample frequency of 44.1 kHz when the sample rate was reduced below 44.1 kHz during the vari-speed operation because the filters would no longer remove signals below the Nyquist frequency (one-half the SF), resulting in alias-frequencies being generated. Filter requirements were also eased compared to 44.1 KHz and the early implimentation of the required filters with very steep cut-offs quite often resulted in audible side-effects in the audio frequency range.

Additionally; early digital video systems adopted 48 kHz as their standard SF, which increased the need to convert digital recording made at 44.12 kHz to 48 kHz and vice-versa.

Largely due to the lack of high quality digital processing that performed tasks common to Mastering such as equalization and compression/limiting; the need for high-quality sample rate conversion was often addressed simply by playing the original recording back through a DA converter for analog processing and re-encoding using an AD converter operating at the desired SF.

As the pursuit of quality continued in digital audio; higher sample frequencies equal to twice the original standards of 44.1 and 48 kHz were introduced. Using SF's that were exactly twice or one-half the output SF makes the used of synchronous sample frequency conversion possible; with the resulting increased accuracy and reduced computational demands. Changes in the method used for conversion with the introduction of over-sampling converters also meant that conversion could occur at higher SF's than the output (AD) sample frequency or input (DA) sample frequency. By converting at frequencies higher than 44.1 or 48 kHz and employing digital filtering during conversion; filter design constrains were eased significantly, allowing analog filters with higher cut-off frequencies and thus less-steep cut-off's to be used. This helped eliminate the source of many of the audible side-effects of steep cut-off analog filters used in the early converter designs.

High-quality digital SFC is computation-intensive. Due to limitations of DSP technology in the 1980's when the CD was introduced; professional audio engineers often found the results produced by early SFC devices to be less than satisfactory. As technology advanced; the use of DSP based SFC became commonplace; and it is not unusual for contemporary recordings to be made at 96 or 88.2 kHz and continue to be processed during Mastering at one of these higher sample frequencies until the last stage of Mastering where the final CD file is output using SFC. This is also due in part to the increasing demand for "high resolution" formats for distribution via other methods such as audio DVD or digital download.

Overview

Linear PCM recording employs equal "step-size" in both the time domain and amplitude domain. Accurate playback requires the "step-size" to be exactly the same in playback as it was in recording, with any variation from the ideal resulting in distortion. If the SF of a recording is changed during playback; the pitch and duration of the recording will change as a result. In order to change the sample rate without affecting the pitch or duration, new "samples" that are between the original samples on the timeline must be generated.

Because there is no information available in the original recording "between the samples;" the only way to generate "new samples" is to use some form of interpolation. The crudest method of interpolation is "linear interpolation" and as the name implies; is only accurate for a straight line. Because music waveforms are much more complex in nature than a straight line; more sophisticated methods of interpolation are required to produce acceptable results.

One example of linear interpolation applied to audio is the audio CD standard. In order to achieve the goal of recording 75 minutes of stereo audio on a CD with the technology at the time; SONY/Phillips decided to employ a lower level of error correction than typical of computer data standards. This meant that errors beyond a certain size could not be completely corrected. Instead, linear interpolation (or "averaging") was employed as an "acceptable" means of partially correcting the error to avoid the possibility of more catastrophic results; such as a loud "pop," if the error was not corrected in some manner. Experience has shown that the audibility of averaging is very program dependent, and in many cases was fairly audible; especially when listening with headphones.

This example is of a very short average, between one and a few sample periods in duration, with an immediate return to accurate reproduction. The consequences of applying linear interpolation to an entire audio file would be quite unacceptable in high fidelity applications.

High quality SFC requires much more sophisticated interpolation; and this requires large amounts of DSP to accomplish. The input data is typically "re-sampled" using interpolation to a sample frequency that is a multiple of the output sample frequency; so that samples "do exist" at the desired output SF. This technique is sometimes referred to as "over-sampling" and some form of over-sampling is also employed in contemporary audio converter design. High-accuracy interpolation involves including information from audio samples that occur both well before and after each audio sample in the computation; which increases both the time and processing power required to produce the results.