Saturday, February 23, 2013

3D audio technologies February 2013

State of the art in 3D audio technologies 2013
In this chapter we present a brief overview of the state of the art in 3D
surround sound. The technologies reviewed here span from complete frame-
works that account for the whole chain from capture to playback, such
as Ambisonics and Wavefield Synthesis, to extensions of existing 2D ap-
proaches, like amplitude panning, to a brief mention of hybrid systems and
solutions that have been recently introduced to the market.
This chapter is not meant to be a complete and detailed description of the
technologies, but just to introduce their most relevant aspects and give the
reader a basic knowledge of the subject, providing a context for the topics
that are mentioned in the rest of the thesis. References to key research
papers and books are provided in each section.
Binaural audio
Binaural audio is perhaps the most straightforward way of dealing with
three-dimensional audio. Since we perceive three-dimensional sound with
our two ears, all the relevant information is contained in two signals; indeed,
our perception is the result of interpreting the pressure that we receive at
the two ear drums, so recording these signals and playing them back at the
ears should suffice for recreating life-like aural experiences.
Our perception of the direction of sound is based on specific cues, mostly
related to signal differences or similarities between the ears, that our brain
interprets and decodes. In the end of the nineteenth century, Lord Rayleigh
identified two mechanisms for the localization of sound: time cues (which
are also interpreted as phase differences) are used to determine the direc-
tion of arrival at frequencies below 700 Hz, while intensity cues (related to
signal energy) are dominant above 1.5 kHz. In the low
frequency region of the audible spectrum, the wavelength of sound is large
compared to the size of the head, therefore sound travels almost unaffected
and reaches both ears regardless of the direction of arrival. Besides, unless
a sound source is located very close to one ear, the small distance between
ears does not cause any significant attenuation of sound pressure due to the
decay with distance.
The basic concept behind 3D binaural audio is that if one measures the
acoustic pressure produced by a sound field in the position of the ears of
a listener, and then reproduces exactly the same signal directly at the ears
of the listener, the original information will be reconstructed. Binaural au-
dio is perfectly linked with our perception, because it takes implicitly into
account the physical mechanisms that take part in our hearing. Binaural
recordings are implemented by means of manikin heads with shaped pinnae
and ear canals, with two pressure microphones inserted at the end of the
ear canal, thus collecting the signals that a human would perceive. Exper-
iments have been done with miniature microphones inserted into the ear
canals of a subject, to obtain recordings that are perfectly tailored to a per-
son’s shape of the outer ear . Binaural playback requires
using headphones to deliver each ear the corresponding recorded signal, and
the technique delivers good spatial impression. It is worth mentioning that
while listening to conventional mono or stereo material through headphones
conveys a soundstage located within the head, the use of binaural technique
accurately reproduces sounds outside the head, a property which is called
“externalization”.
Physically, the signals that reach the ear drums when a sound source
emits a sound from a certain position can be expressed as the convolution
between the sound emitted by the source and the transfer function between
the position of the source and each ear (neglecting effects of the room).
The head related transfer functions (HRTF) depend on the position of the
source, the distance from the listener and the peculiar shape of the outer
ear that is used during recording. Various HRTF databases are available
which offer the impulse response recordings done with the source sampling
a sphere at a fixed distance (far field approximations are used and distance
is usually neglected). With such functions, binaural material can also be
synthesized by convolution: once a source and its position are chosen, the
left and right binaural signals are obtained by convolving the source with
the left and right HRTF corresponding to the position of the source. In this
way, realistic virtual reality scenarios can be reproduced. In the real time
playback of synthetic sound fields, the adoption of head tracking to detect
the orientation of the listener and adapt the sound scene accordingly has
been proven invaluable for solving the localization uncertainty related to the
circles of confusion or the front-back ambiguity.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.