Latency

Overview

The term “latency” is used broadly to describe the time it takes for an output to result from an input to a system. In audio, this term is most commonly used to describe the time it takes for an analog audio input signal to propagate through a digital audio device or system and exit that system in the form of an analog audio signal.

History

Before the advent of digital audio; most people were not concerned with the very small, but finite delay between when an analog audio signal entered a piece of analog audio equipment and when it exited. Analog delays are extremely short in terms of human perception and are therefore, for all practical purposes, non-existent. For this reason, most people would consider analog audio circuitry to have zero latency.

Due to the nature of contemporary digital audio processing; virtually all digital audio equipment has some delay between when the analog audio enters and exits. With hardware-based systems; this delay can be minimized, resulting in a large part of the remaining latency being caused by conversion delay. The delay in this case can be too short to be perceived as a distinct echo, but long enough to be characterized as a flange or doubling of the original audio signal. With computer-based recording systems, the need for RAM buffering may increase this delay significantly; to the point where it can be perceived as an echo and seriously affect the performance of musicians who are playing while listening to themselves through the recording software.

Basics

There is more than one source of delay in computer-based digital audio recording systems. The three largest contributors are:

The AD and DA converters used to encode and decode analog audio to and from digital audio.
The RAM buffering needed to get the digital audio between the audio interface, recording software, and hard drive.
The RAM buffering required for processing signals by high quality Plug-ins.

Issues

There are two important areas where latency can cause operational issues: Cue mix monitoring and in the relative timing of playback signals recorded using external converters.

Cue-mix monitoring

"Cue mix monitoring" refers to the practice of using headphones to listen to both the previously recorded tracks and live inputs through the recording system during recording or overdubbing. In this situation, the musician must hear both the playback of the previously recorded material and themselves as they play for timing “cues.” If more than one musician is playing; they must also be able to hear the other musicians.

Recording with a combination of built-in and external converters

In the majority of cases, the recording software cannot automatically compensate for the conversion delay of the external converters because it has no way to obtain the information from the converters via the digital audio interface. The result is that tracks recorded through external converters will appear in the recording software’s timeline at a slightly different time than tracks recorded with built-in converters. In this case; the effect of the delay in the digital domain affects the timing of the resulting analog output during playback of a mix containing signals recorded with both the built-in converters and the external converters.

Recording with more than one type of external converter

Recording with more than one type of external converter can result in a different delay for each type of converter; with none of the resulting recordings “lining-up” with the playback signals in the software’s timeline. In this case; the effect of the delay in the digital domain affects the timing of the resulting analog output during playback of a “mix” containing signals recorded with each type of external converter.

Solutions to latency in cue-mix monitoring have no direct effect on latency issues when using external converters. Thus external conversion latency issues can still exist in systems that offer low-latency monitoring.

Solutions

Monitoring during Recording and Overdubbing

There are a variety of solutions to latency in the cue mix during recording and overdubbing. At the most basic level, RAM buffer setting can be minimized; but the lower limit is set by how many tracks can be played or recorded reliably. Disabling plug-ins (particularly high quality plug-ins with the corresponding longer processing delays) can also help minimize system latency. Using higher sample frequencies also can shorten delays to a certain extent, with the corresponding penalties of increased storage space required and more processing power required for plug-in processing. Some systems have low-latency monitoring that effectively bypasses the recording software to provide a DSP mix of the playback with the digitized input signals. In most cases, effects plug-ins are not available for the live signals because they are not routed through the recording software. But even in the case of low-latency monitoring, conversion delay is still present and limits how much the delay can be reduced.

All AD converters have some finite delay between when the analog audio enters the input and when the corresponding digital audio signal appears at the output, or in the case of DA conversion; from when the digital audio input appears at the input and when the corresponding analog signal exits the analog output. The length of the delay varies with model, sample rate, and type of input/output digital interface.

Most contemporary audio converters are based on the principle of “oversampling” which means that they operate internally at a higher clock frequency than the input or output PCM digital audio signal. In sigma-delta AD conversion; the signal may be sampled at 64-1024 times the output sample frequency at a resolution of 1-5 bits. To achieve high quality results, DSP filtering is employed which requires processing information over a number of output samples, and the quality of the results is related to the number of samples included in this calculation. Increasing the number of samples included increases latency and decreasing the number of samples decreases audio quality, so a balance must be found between audio quality and conversion delay.

The effort to minimize conversion delay has led to compromises in audio quality, either by employing “low-latency conversion” which uses lower quality digital filtering to reduce processing time, or by converting at sample frequencies higher than 96 kHz. The conversion delay does add to the latency of the system, and does impact how “low” the delay can be reduced in low-latency monitoring. Even if this delay is reduced to the point where it is not immediately obvious, as in the case of a distinct echo; it can still affect the performance of musicians who are listening to the delayed signal at the same time they are playing or singing.

The alternative is to provide zero latency monitoring of the live signals in the analog domain (prior to conversion and the associated delays). Advantages to this approach are that it effectively un-limits RAM buffer settings, which can affect the number of tracks that can play back at the same time, and the use of high quality plug-ins with their associated need for longer processing times. Software which has “latency compensation” for plug-ins does not eliminate the delay required for plug-in processing. It simply adds additional delay to the other tracks so they are delayed by the same amount as the track with the plug-in.

This analog approach also makes the choice of sample frequency a non-issue, because it is not necessary to use very high sample frequencies to minimize the recording system delays. Conversion quality also does not need to be compromised by employing low-latency conversion. The Lavry LK-1 Latency Killer is an example of this approach to generation of a zero-latency cue mix.

External converters

The other important effect of conversion delay is seen when external converters are used. In the following discussion; the concept of “musically significant” is introduced. Due to the fact that sound travels at a fairly low speed in terms of human perception; most musicians are accustomed to making small adjustments to the timing of their playing to compensate for the acoustic delay caused by the time it takes sound to travel through the air from other musicians/instruments.

Acoustic delays are typically in the range of a few milliseconds (a millisecond equal’s one one-thousandth of a second). In very rough terms; sound travels approximately one foot in one millisecond (~1.126 feet/millisecond); so simply placing the microphone one foot further from the musician or instrument will add approximately one millisecond of additional delay.

Most people consider a delay of between 0 and 3 milliseconds to not be musically significant when listening to the playback of multiple tracks. This is not necessarily the case when a musician listens to themselves through headphones while they are playing; such as in overdubbing. In this special case; even relatively short delays can affect performance at a subtle level. By contrast; the effect of small differences in delay of playback only is heard as the musician playing slightly more “on the beat” or “behind the beat.”

All AD converters have some finite delay between when the analog audio enters the input and when the corresponding digital audio signal appears at the output, or in the case of DA conversion; from when the digital audio input appears at the input and when the corresponding analog signal exits the analog output. The length of the delay varies with model, the sample rate, and the type of interface used to send or receive the digital audio signal.

If the converters are integrated in a digital audio workstation system; it is possible for the recording software to access the time value of this delay and compensate for it accordingly. This is similar to the manner in which the software uses the buffer settings to determine the offset required to “line-up” new recordings with previously recorded tracks. Because RAM buffering is the source of a major part of the total latency in computer based recording systems; this is a significant feature. However, in the majority of cases when external conversion is used, the software cannot automatically set itself to the correct value because it cannot access this information through the digital audio interface. This can be true of almost any system where external converters are used, including those that provide low-latency monitoring.

In the case of multiple external converters in a system which has no built-in converters, unless there is some form of reference for the playback, such as tracks generated by “virtual instrument” software or sound-files imported from other sources; the only audible reference for the correct time during playback would be the converter with the shortest delay.

In some cases; this difference in delay between converters can be short enough to be musically insignificant, and can therefore be ignored. In other cases; it can be musically significant. But to perfectly align the signals recorded with external converters to the previously recorded tracks; it is usually necessary to manually compensate for the delay.

Manual Compensation for Converter Latency

There are two methods to manually compensate for the delay, when necessary:

If the software has a “latency compensation” setting for this purpose; after determining each external converter’s delay for a given sample rate, the information can be entered into the software setting. Depending on the step size of the available settings, it may be possible to closely approximate the actual delay. Again; it is a matter of whether the difference in setting and actual delay value is musically significant or personal preference as to whether further action is required.
If the compensation is not available or the setting is not exact enough; a completely “manual” approach allows the recorded signals to be aligned to sample accuracy. It involves playing back a signal from the system and simultaneously re-recording this signal through the external converter to measure the delay. By measuring the difference in the position of the two waveforms in the software’s timeline, the “offset value” can be determined for the specific converter, sample rate, and buffer setting. Files recorded through the external converter then need to be re-positioned by offsetting the start time by this offset value. Details of this procedure are beyond the scope of this discussion.

If the delay is musically significant, to avoid it distracting the musician listening to a previous take before the punch-in point; it would be necessary to offset each segment prior to playback during overdubbing. If the delay is minimal; it may be possible to simply ignore it during the recording and overdubbing process and, upon completion of the recording, apply the offset to all of the newly recorded files as a group.

In the majority of cases, the difference in delay between internal and external converters is likely to be musically insignificant, and can therefore be either ignored or compensated for after the recording is complete.