Audio quality in networked systems

2. Networked audio systems

This chapter presents a modular networked audio system as a reference for the rest of this white paper. A collection of 12 modules are introduced as building blocks of a system. The described system processes audio in acoustic, analogue and networked formats.

Audio System

A collection of components connected together to process audio signals in order to increase a system’s sound quality.

The following pages will elaborate further on audio processes, formats and components.

2.1 Audio processes

A system’s audio processing can include:

table 201: audio processing types

function

description

conversion

format conversion of audio signals

transport

transport of signals, eg. through cables

storage

storage for editing, transport and playback using audio media, eg. tape, hard disk, CD

mixing

mixing multiple inputs to multiple outputs

change

equalising, compression, amplification etc

The audio system can be mechanical - eg. two empty cans with a tensioned string in-between, or a mechanical gramophone player. But since the invention of the carbon microphone in 1887-1888 individually by Edison and Berliner, most audio systems use electrical circuits. Since the early 1980’s many parts of audio systems gradually became digital, leaving only head amps, A/D and D/A conversion and power amplification remaining as electronic circuits, and microphones and loudspeakers as electroacoustic components. At this moment, digital point-to-point audio protocols such as AES10 (MADI) are being replaced by network protocols such as Dante, EtherSound.

In this white paper, the terms ‘networked audio system’ and ‘digital audio system’ are applied loosely, as many of the concepts presented concern both. When an issue is presented to apply to networked audio systems, the issue does not apply to digital audio systems. When an issue is presented to apply to digital audio systems, it also applies for networked audio systems.

2.2 Audio formats

Although with the introduction of electronic instruments the input can also be an electrical analogue or digitally synthesised signal, in this white paper we will assume all inputs and outputs of an audio system to be acoustic signals. In the field of professional audio, the following identification is used for different formats of audio:

table 202: audio formats

format

description

acoustic

audio signals as pressure waves in air

analogue

audio signals as a non-discrete electrical voltage

digital

audio signals as data (eg. 16 or 24 bit - 44.1, 48, 88.2 or 96kHz)

networked

audio data as streaming or switching packets (eg. Ethernet)

A networked audio system includes these audio formats simultaneously - using specialised components to convert from one to another:

table 203: audio format conversion components

source format

destination format

component

acoustic ->

analogue

microphone

analogue ->

digital/networked

A/D converter

digital/networked ->

analogue

D/A converter

analogue ->

acoustic

loudspeaker

2.3 Audio system components

In this white paper we assume an audio system to be modular, using digital signal processing and networked audio and control distribution. An audio system’s inputs and outputs are assumed to be acoustic audio signals - with the inputs coming from one or multiple acoustic sound sources, and the output or outputs being picked up by one or more listeners. A selection of functional modules constitutes the audio system in between sound sources and listeners.

A typical networked audio system is presented in the diagram below. Note that this diagram presents the audio functions as separate functional blocks. Physical audio components can include more than one functional block - eg. a digital mixing console including head amps, A/D and D/A converters, DSP and a user interface. The distribution network in this diagram can be any topology - including ring, star or any combination.

Acoustic source

An acoustic sound source generates vibrations and radiates them to the air. Sound sources can be omni-directional - radiating to all directions, or directional, concentrating energy in one or more directions. Musical instruments use strings (eg. guitar, piano, violin), surfaces (eg. drums, marimba) or wind (eg. flute, trombone) to generate sound. In nature, sound often is generated by wind shearing past objects (eg. trees, buildings). The output of an audio system is also an acoustic sound source. Finally, almost all human activities (including singing) - and man-made machinery (including car engines and bomb detonations) generate sound. The lowest sound pressure level in dB generated by acoustic sound sources closes in to minus infinity - eg. resting bodies at absolute zero temperature . The maximum undistorted sound pressure level is said to be above 160dBSPL before vacuum pockets start to form in the air. The lowest frequency an acoustic sound source can generate closes in to zero Hertz (‘subsonic’), where the maximum wave pressure frequency in air without distortion is said to be above 1GHz.
Human auditory system

The human auditory system constitutes the combination of two ears and one brain, creating a hearing sensation invoked by audio signals generated by acoustic sources. The inner-ear codes a level range of appr. 120dB and a frequency range of appr. 20kHz into neural firing patterns, and sends them to a specialised part of the brain called ‘auditory nervous system’. The brain interprets the coded signals and invokes a hearing sensation. The hearing sensation is most significantly influenced by changes in level and frequency over time, with the lowest detectable time slot being as low as 6 microseconds. Basic parameters of hearing sensations are loudness, pitch, timbre, localisation.
Microphone

Microphones convert acoustic signals into electric signals - the analogue domain. Dynamic microphones use a coil and a magnet to generate the electrical signal, condenser microphones reach a higher accuracy using a variable capacitor construction that is much lighter than a coil. Further varieties are Piezo microphones and electromagnetic elements to directly pick up guitar strings.
head amp

The professional audio market adopted a nominal analogue signal level of 0.775Vrms as 0dBu reference for line level audio signals, optimally supporting electronic circuit designs with 9V to 15V balanced power supplies used in many audio products. As microphones generally output a much lower signal level - typically around 0.3mV (-68dBu) for the average sound level of conversational speech at 1 meter from the microphone (60dBSPL), these signals are amplified to a nominal level before entering further electronic circuits using a microphone preamplifier, or ‘head amp’, abbreviated HA. Head amps most commonly have an amplification range of around 70dB, and are designed to have a very low noise floor. The most common Equivalent Input Noise (EIN) of a head-amp is -128dBu (0.3 μVrms), with a maximum input level before clipping of up to +30dBu (24V). But as the balancing and buffering circuits of the HA block also add noise, and analogue level switching changes the signal levels in the gain control circuit, the maximum dynamic range a typical HA delivers to the A/D block is around 112 dB. Of course, whenever the HA gain is increased to match a microphone's signal level, the HA noise floor will increase as well, lowering the dynamic range. More details on head-amp quality issues are presented in chapter 7.
A/D converter

An A/D converter converts electrical (analogue) signals to digital data for further processing in digital audio systems. This process is called ‘sampling’, with most modern A/D converters using a 24-bit data width to represent audio signals. This allows a theoretical dynamic range of approximately 144dB to be registered accurately, with the inaccuracies in the A/D process accumulating in a digital noise floor at -144dB. Most modern digital audio equipment use 48kHz or 96kHz sampling rates, supporting 20kHz or 40kHz frequency ranges. More details on sampling are presented in chapter 5.
distribution network

A distribution network is a collection of components used to transfer data from and to all physical locations in the audio system. The distribution of course includes audio, but it can also include data to control audio components, and other media data such as video and lighting control. A distribution network can consist of multiple point-topoint connections, separately for audio, control and other data. Such a network needs hardware routers or patch panels at every location to patch sources and destinations. This is not only expensive, but it also limits design freedom as functional connections are restricted by physical connections - and for every change in a system’s functional design, the physical design has to change with it. Also, distribution systems based on point-to-point connections have very limited redundancy options. This is why networked systems have become a standard for audio distribution systems - supporting the functional and physical designs to be fully independent and also fully redundant. The audio protocol can be based on Ethernet, or it can include an embedded Ethernet tunnel. As most control systems use Ethernet, and protocol converters are available for other protocols (eg. USB, MIDI, RS232), the use of Ethernet allows virtually any digital data format to be transported over the same distribution network. If the audio system is Ethernet based - using Dante, Ethersound and/or CobraNet, the distribution network will typically be a collection of Ethernet switches and cables. More details on operational (non-audio) quality issues in networks is presented in chapter 9.
change & mixing (DSP)

Digital Signal Processors are used to perform real-time change and mixing of audio signals. Some LSI manufacturers, including Yamaha, Analog Devices, Motorola and Texas Instruments, offer dedicated DSP hardware architecture. Combined with general purpose Field Programmable Gate Arrays (FPGA) chips, the processing power of digital systems has evolved to a level way beyond the capabilities of previously used analogue systems. High data widths - eg. 32 bit or higher - ensure that error residuals of DSP calculations stay well under the head-amp and A/D converter’s noise floors, leaving algorithm design and the user interfacing as main quality parameters for DSP functionality.

In the past, dedicated DSP was normally built into mixing consoles, effect units or speaker processors. But since networks started to support high channel counts, DSP units - including ‘plug-in servers’, ‘mixing engines’, effect units, speaker processing and user-programmable DSP units - can be located anywhere in the system in any quantity. More details on DSP quality issues is presented in chapter 6.
storage (recording, playback, editing)

A digital audio system can process audio in real time, but it also can store audio streams on media such as hard disks, memory cards, CD, DVD for later processing or playback. Through storage, an audio process can flow through multiple audio systems at different time slots - eg. a multitrack live recording being stored on a hard disk, then edited on a second system to an authoring DVD, then mixed down on a third system to CD, then transferred to a customer by post and then played back on a fourth system: the stereo system at the customer’s home. Multitrack recording, editing and authoring is most commonly done with Digital Audio Workstation (DAW) software running on Personal Computers - using Ethernet connectivity to connect to networked audio systems.
D/A converter

D/A converters convert digital audio data to electrical (analogue) signals to be sent to power amplifiers, accepting the same data width and rate as the A/D converters and the distribution network of the audio system.
power amplifier

A power amplifier increases an audio signal voltage to a higher level at a low impedance to drive energy into loudspeakers. Modern power amplifiers use high frequency switching output stages to directly drive loudspeakers (class-D), sometimes combined with AB class circuits (class TD, EEEngine(*2A)). Some power amplifiers have distribution interfaces, DSP (for speaker processing) and D/A converters built-in.
loudspeaker

Loudspeakers convert electric signals into acoustic signals. High quality loudspeaker systems use multiple transducers to generate a combined acoustic output, each delivering a separate frequency range. Multiple time-aligned transducers - ‘line arrays’ - can be used to generate coupled acoustic coverage. High frequency transducers (tweeters, compression drivers, ribbon drivers) are available in sizes varying from 0.5” to 3”, mid frequency transducers from 5” to 15”, and low frequency transducers (‘woofers, sub woofers’) from 8” to 21”. Loudspeakers and individual transducers have an efficiency (sensitivity) and a maximum SPL output (peak SPL), standardized through the AES1984 norm. In a networked audio system, the loudspeakers are the most prominent sources of distortion - depending on the build quality of the transducer, but also the enclosure. Fortunately, the kind of harmonic distortion generated in loudspeakers often positively contributes to sound quality.
User interface

To allow sound engineers to operate audio systems, manufacturers of components provide some form of user interface. Conventional (mostly analogue) audio components use hardware ‘tactile’ user interfaces such as knobs and faders as an integral part of the analogue electronic circuitry. The use of digital technology introduced remote and graphic interfaces such as mouse/trackpad, display and touch screens, while the introduction of networking technology allowed multiple user interfaces to coexist in one system, sharing physical connections through the network protocol, and also functionality through common control protocols. Examples are the many available online graphic user interfaces on personal computers and tablets for digital mixing consoles.

Chapter 1 - Audio Quality

Presents a set of definitions and requirements. To support meaningful discussions on audio quality, the concepts ‘quality’, ‘audio’ and ‘sound’ are defined in detail.

1.1 Audio

1.2 Sound

1.3 Audio processes

1.4 Quality

1.5 Audio quality

1.6 Sound quality

1.7 Discussing audio quality
Go to Chapter 1
Chapter 2 - Networked audio systems

Presents a description of a typical networked (and therefore digital) audio system. The described system is modular, supported by networking technologies that have become common practise in the professional audio field.

2.1 Audio processes

2.2 Audio formats

2.3 Audio system components
Go to Chapter 2
Chapter 3 - Performance & Response

Presents the Performance / Response concept - identifying system process parameters and requirements to help assessing the quality of audio systems. Two design philosophies are presented: ‘natural sound’ - where the focus lies on preserving the artistic quality of the audio event and offering Response tools to the sound engineer as variable parameters, and ‘coloured sound’ where a fixed sound-changing Response is designed into products and systems.

3.1 Unintended and Intended changes

3.2 Performance & Response

3.3 Natural sound and coloured sound
Go to Chapter 3
Chapter 4 - The human auditory system

Briefly presents a description of the human auditory system, including the mechanics of the outer and middle ear, the bio-mechanical coding to the frequency domain by the inner ear, and the transport of the coded firing patterns to the brain through auditory nerves. Using this description, a ‘human audio universe’ is defined to possess three dimensions: level, frequency and time. Also some auditory functions such as localization and masking are presented.

4.1 Ear anatomy

4.2 The audio universe

4.3 Auditory functions
Go to Chapter 4
Chapter 5 - Sampling issues

Presents the audio digitalization (sampling) concept in relation to level, frequency and timing. Dynamic range and frequency range are more or less common concepts, developed to a mature state by the manufacturers of digital audio equipment in the past 25 years. Compared with the 1985 digital (16-bit) technologies, modern 24-bit A/D, D/A and distribution technologies and 32-bit or higher DSP architecture have caused noise floors and distortion levels to move close the boundaries of the audio universe. On timing however, the use of networked audio systems pose new challenges to system designers and sound engineers. This chapter presents the digitalization concept in relation to timing, including latency, jitter and clock phase.

5.1 Digital Audio

5.2 Dynamic range

5.3 Frequency range

5.4 Timing issues

5.5 Absolute latency

5.6 Relative latency

5.7 Word clock

5.8 Clock phase

5.9 Temporal resolution

5.10 Jitter
Go to Chapter 5
Chapter 6 - Distribution & DSP issues

Presents a description of the transport and DSP infrastructure in a digital audio system. Transport and DSP architecture - eg. bit depth, fixed/floating point processing - are described to have an effect on a system’s audio quality, with only the algorithm (plug-in) design to affect the system’s sound quality.

6.1 I/O distribution

6.2 Interconnected DSP distribution

6.3 Constant gain A/D converters

6.4 DSP architecture

6.5 Fixed point vs. Floating point

6.6 DSP user interfaces
Go to Chapter 6
Chapter 7 - Signal chain level issues

Focuses on audio levels in a system, proposing a ‘0dBFS’ level standard as the optimal design paradigm that allows easy identification of quality problems in a signal chain. Several practical quality issues in system design are presented, such as head amps, gain compensation, clip level mismatch, double pass signal chains. Also, audio compression in speaker processing stage (unbalanced output modes) is discussed, placing the responsibility in the Response (sound quality) domain rather than the Performance (audio quality) domain.

7.1 0dBFS

7.2 Head amps

7.3 Gain compensation

7.4 Clip level mismatch

7.5 Double A/D-D/A pass signal paths

7.6 Unbalanced output mode
Go to Chapter 7
Chapter 8 - operational quality

Presents operational quality issues in a networked audio environment, including topology and protocol and their effect on logistics, reliability and redundancy. The use of ethernet - either as protocol or as embedded service - is posed to be of essential importance to comply with operational quality requirements on design freedom and user interfacing.

8.1 Network implications

8.2 Ethernet compliance

8.3 Redundancy

8.4 Switches and cables
Go to Chapter 8
Chapter 9 - quality assessment methods

Presents methods for subjective and objective quality assessments of audio systems. Conditions for controlled listening tests are proposed for audio quality assessment. Full control over the experiments with careful adjustment of test equipment and environment, and proper statistical analysis are crucial to obtain meaningful results that justify statements on product and system audio quality and Response characteristics.

9.1 Quality assessment through electronic measurements

9.2 Quality assessment through listening tests

9.3 Conducting listening tests
Go to Chapter 9
Appendix

Lists information sources and further reading suggestions.
Go to Information sources & further reading

2. Networked audio systems

2.1 Audio processes

table 201: audio processing types

2.2 Audio formats

table 202: audio formats

table 203: audio format conversion components

2.3 Audio system components

Acoustic source

Human auditory system

Microphone

head amp

A/D converter

distribution network

change & mixing (DSP)

storage (recording, playback, editing)

D/A converter

power amplifier

loudspeaker

User interface

Chapter 1 - Audio Quality

Chapter 2 - Networked audio systems

Chapter 3 - Performance & Response

Chapter 4 - The human auditory system

Chapter 5 - Sampling issues

Chapter 6 - Distribution & DSP issues

Chapter 7 - Signal chain level issues

Chapter 8 - operational quality

Chapter 9 - quality assessment methods

Appendix