Ambisonics and Periphony [part 1]
In order to see this content you need to have both Javascript enabled and Flash installed. Visit Βι¶ΉΤΌΕΔ Webwise for full instructions. If you're reading via RSS, you'll need to visit the blog to access this content.
There are a number of disadvantages to this way of recording surround sound. One of the major issues is compatibility with formats with a different number of channels. The sound engineer has check compatibility with mono, stereo and 5.1. In the future the engineer may have to also check with 7.1, 22.2 and whatever other discrete channel surround system that may come next. That would require a lot of time and a room with enough speakers to cover every possible set up.
Another issue faced by an organisation like the Βι¶ΉΤΌΕΔ is how we archive our material. Theoretically, if we archived the stereo, 5.1 and 7.1 mixes of a piece of audio it would take 8 times the amount of space than just the stereo recording. These ITU standards were borne out from a lot of research into which angles gave the best sound, and are essential when setting up a studio or listening room. However, I would be surprised if many of our audience had their own ITU 5.1 set up, and the talks I've had with friends in the computer games industry suggest most of their customers who listen in 5.1 don't follow ITU's recommendation, preferring to use a square, perhaps because that set up fits best around their furniture. While games users may not be representative of the Βι¶ΉΤΌΕΔ's audiences, we shouldn't assume our 5.1 listeners are using an ITU recommended set up.
A possible alternative to these discrete channel formats is a system called Ambisonics. This system was developed in the 1970s and has had a cult following since but has yet to break into the mainstream, being of interest mainly to academics and select audio engineers. The fundamental idea behind Ambisonics is to attempt to represent a sound-field at a single point in space.
Without going into too much detail it is an extension of the , but capturing audio from three perpendicular figure of eight microphones all positioned at the same point in space. When combined with an omnidirectional microphone these four signals are know as B-format. This signal represents the three-dimensional sound-field.Μύ
So how might this technology help solve some of the problems described above? A major potential advantage of Ambisonics is its lack of dependency on speaker position. Unlike 5.1, the audio channels being carried in an Ambisonic signal do not map directly onto speakers. The number of speakers and the way they've been set-up by the listener is not as important and the same signal can be decoded to any speaker array. This flexibility would allow one common set of signals to be sent to everyone, and they would be able to decode it to suit their listening environment, regardless of the way they've chosen to set up their sound system. This also has obvious advantages from an archival point of view, and unlike stereo, 5.1 and 7.1, keeping the Ambisonics recordings could potentially help future-proof the archive. In my next post I'll talk about what we've done so far, and what we might do with Ambisonics in the future.
Comment number 1.
At 11th Mar 2010, GarageYears wrote:As one of those "select audio engineers" actively using an Ambisonics-based audio system in the product that I work with, I am very encouraged to see the Βι¶ΉΤΌΕΔ following this path.
It may have been a long time coming (and I'm pretty sure various departments have worked on Ambisonics within the Βι¶ΉΤΌΕΔ previously), I can only hope this may continue, and lead to something mainstream. Ambisonics is really a very neat solution to the problem of delivering surround sound to various format playback configurations. Please keep posting your progress!
Complain about this comment (Comment number 1)
Comment number 2.
At 12th Mar 2010, HD wrote:That's good for audio, though we already have surround sound but not surround video yet. Maybe the two together would be better. Also, I think it's a bit unfair on the video side because they are going for more and more audio channels eg. 5.1, 7.1, or even higher (eg. 22.2 for Super High Vision), yet for video we usually only have 1, or soon 2 (for stereoscopic "3d") but no better (eg. multiview or 3d).
Also, for audio, they are going for uncompressed or lossless compressed audio, but they never do that for consumer video. And they are using ever higher audio sample rates because they know that produces better, more accurate sound, but the Βι¶ΉΤΌΕΔ (Βι¶ΉΤΌΕΔ HD) are going for ever lower video sample rates (eg. usually shooting video at 25 hz instead of 50 hz that we've had for years with SD, and feature films are usually made at 24 fps, the same as they were since sound was added to film many years ago), despite the fact that Βι¶ΉΤΌΕΔ research has said that we should be increasing the video frame rate to much higher (otherwise we lose the advantages of HDTV over SDTV for moving things). High quality video needs a high video sample rate (frame rate) for accurate representation of video (and so it doesn't judder or strobe), just like audio needs a high sample rate for high, accurate representation of audio.
Complain about this comment (Comment number 2)
Comment number 3.
At 12th Mar 2010, John Leonard wrote:Good to see that this is work in progress at the Βι¶ΉΤΌΕΔ once again. I see from the video that you appear to be using Nuendo on a Mac and I'd like to know whose plug-ins you're using to do the transcoding. I'd also be interested in knowing where you obtained the recordings of Spitfires at Duxford that are mentioned in the video. Are these Βι¶ΉΤΌΕΔ recordings, or have you obtained them from somewhere else?
Keep up the good work.
John
Complain about this comment (Comment number 3)
Comment number 4.
At 12th Mar 2010, Richard Lee wrote:> These ITU standards were borne out from a lot of research into which angles gave the best sound,..
Could you point us to the work which concluded that the ITU-R 5.1 layout gives the best sound? AFAIK, this was a crude attempt to replicate the layouts in cinema. It is very suboptimal for surround sound. I have only seen it in a very small number of studios and research establishments. It is unknown in domestic environments.
Among domestic listeners who have tried to place speakers properly, a square (or near square) is by far the most common layout with the listener somewhat back from centre.
Ambisonic decode to a square works very well for these common layouts. Listeners instinctively move to the centre of the square when they encounter good surround material.
has a number of papers investigating this.
Complain about this comment (Comment number 4)
Comment number 5.
At 13th Mar 2010, Richard E wrote:Nice to see the Βι¶ΉΤΌΕΔ working on Ambisonics again, and that it hopefully isn't impacted by the disappearance of Kingswood Warren.
My own research on surround layouts found in the wild supports your observations that most people listen in a square, or at least a rectangle to which a square is a reasonable approximation. I do not believe this is limited to gamers, but is quite common in average home theatre environments. In addition, Ambisonics is very robust and can take a bit of speaker positioning error without destroying the illusion.
I presume you are looking at working and archiving in B format, in which case you could derive a suitable decode at any time for whatever discrete-channel system and/or speaker layout is the fad at the time, should it not be possible to transmit the B-format content to the end-user to render at the listening end.
I have a paper on the effectiveness of square decodes and the use of studio-based decoding of Ambisonics here: [Unsuitable/Broken URL removed by Moderator] (PDF) which I hope may be of interest.
Complain about this comment (Comment number 5)
Comment number 6.
At 13th Mar 2010, Richard E wrote:You can access the paper mentioned above, "Getting Ambisonics Around" from the Articles section on Ambisonic.net:
Complain about this comment (Comment number 6)