Reviews on Technology and Standard of Spatial Audio Coding

Main Article Content

Ikhwana Elfitri
Amirul Luthfi

Keywords

Abstract

Market  demands  on a more impressive entertainment media have motivated for delivery of three dimensional  (3D) audio content to  home consumers  through Ultra  High  Definition  TV  (UHDTV), the next generation of TV broadcasting, where spatial  audio  coding plays  fundamental role. This paper reviews fundamental concept on spatial audio coding which includes technology, standard, and application. Basic principle of object-based audio reproduction system  will also be elaborated, compared  to  the  traditional channel-based system, to provide good understanding on this popular interactive audio reproduction system which gives end users flexibility to render  their  own preferred  audio composition.

Keywords : spatial audio, audio coding, multi-channel audio signals, MPEG standard, object-based audio

References

[1] F. Rumsey, Spatial Audio, 2nd Edition, Focal Press, Oxford, England, 2001.

[2] D. Pan, A tutorial on MPEG/audio compression, IEEE Multimedia No. 2 (1995) 60–72.

[3] T. Painter, A. Spanias, Perceptual coding of digital audio, Proc. of the IEEE 88 (4) (2000) 451–513.

[4] M. Bosi, R. E. Goldberg, Introduction to Digital Audio Coding and Standards, Springer, New York, USA, 2002.

[5] K. Brandenburg, C. Faller, J. Herre, J. D. Johnston, W. B.
Kleijn, Perceptual coding of high-quality digital audio, Proceedings of the IEEE 101 No. 9 (2014) 1905–1919.

[6] K. Brandenburg, G. Stoll, Y. Dehery, J. Johnston, L. Kerkhof, E. Schroder, ISO-MPEG-1 audio: A generic standard for coding of high-quality digital audio, J. Audio Eng. Soc. 42 No. 10 (1994) 780–792.

[7] K. Brandenburg, MP-3 and AAC explained, Presented at AES 17th Int. Conf. on High Quality Audio Coding (September 1999).

[8] H. G. Musmann, Genesis of the MP3 audio coding standard, IEEE Transactions on Consumer Electronics 52 (2006) 1043–1049.

[9] J. Herre, C. Faller, S. Disch, C. Ertel, J. Hilpert, A. Hoelzer, K. Linzmeier, C. Spenger, P. Kroon, Spatial audio coding: Next-generation efficient and compatible coding of multi- channel audio, in: Proc. the 117th Convention of the Audio Engineering Society, San Fransisco, CA, USA, 2004.

[10] J. Herre, From joint stereo to spatial audio coding - recent progress and standardization, in: Proc. of the 7th Int. Conf. on Digital Audio Effects (DAFx’04), Naples, Italy, 2004.

[11] M. Bosi, K. Brandenburg, S. Quackenbush, L. Fielder, K. Akagiri, H. Fuchs, M. Dietz, J. Herre, G. Davidson, Y. Oikawa, ISO/IEC MPEG-2 Advanced Audio Coding, J. Audio Eng. Soc. 45 No. 10 (1997) 789–814.

[12] K. Brandenburg, M. Bosi, ISO/IEC MPEG-2 advanced audio coding: Review and applications”,, in: AES 103rd Convention, New York, USA, 1997.

[13] ISO/IEC, Information Technology - Generic coding of moving pictures and associated audio information, Part 7: Advanced Audio Coding, ISO/IEC 13818-7:2006(E), Int. Standards Organization, Geneva, Switzerland (2006).

[14] ISO/IEC, Information Technology - Coding of audio-visual objects, Part 3: Audio, ISO/IEC 14496-3:2009(E), International Standards Organization, Geneva, Switzerland (2009).

[15] S. Quackenbush, R. Lefebvre, Performance of MPEG unified speech and audio coding, Presented at the 131st Convention of the Audio Engineering Society (October 2011).

[16] ISO/IEC, Information Technology - MPEG Audio Technologies, Part 3: Unified Speech and Audio Coding, ISO/IEC 23003-3/FDIS, Int. Standards Organization, Geneva, Switzerland (2012).

[17] M. Nuendorf, et al, The ISO/MPEG unified speech and audio coding standard - consistent high quality for all content types and at all bit rates, Journal of Audio Engineering Society 61 (12) (2013) 956–977.

[18] E. Oh, M. Kim, Enhanced stereo algorithm in the unified speech and audio coding, in: AES 43rd International Conference, Pohang, Korea, 2011.

[19] J. Breebaart, S. van de Par, A. Kohlrausch, E. Schuijers, Parametric coding of stereo audio, EURASIP J. Appl. Signal Process. 2005 (2005) 1305–1322.

[20] E. Schuijers, J. Breebaart, H. Purnhagen, J. Engdegard, Low complexity parametric stereo coding, Presented at the 116th Convention of the Audio Engineering Society (May 2004).

[21] E. Schuijers, W. Oomen, B. den Brinker, J. Breebaart, Advances in parametric coding for high-quality audio, Presented at the 114th Convention of the Audio Engineering Society (Mar. 2003).

[22] F. Baumgarte, C. Faller, Why binaural cue coding is better than intensity stereo coding, Presented at the 112th Convention of the Audio Engineering Society (May 2002).

[23] F. Baumgarte, C. Faller, Binaural cue coding-part I: Psychoacoustic fundamentals and design principles, IEEE Trans. Speech Audio Process. 11 (6) (2003) 509–519.

[24] C. Faller, F. Baumgarte, Binaural cue coding-Part II: Schemes and applications, IEEE Trans. Speech Audio Process. 11 (6) (2003) 520–531.

[25] C. Faller, F. Baumgarte, Binaural cue coding-Part II: Schemes and applications, IEEE Trans. Speech Audio Process. 11 (6) (2003) 520–531.

[26] S. B. Chon, I. Y. Choi, H. G. Moon, J. Seo, K.-M. Sung, Virtual source location information for binaural cue coding, in: Proc. 123th AES Convention, New York, USA, 2005.

[27] [26] J. Herre, C. Faller, C. Ertel, J. Hilpert, A. Hoelzer, C. Spenger, MP3 Surround: efficient and compatible coding of multi-channel audio, Presented at the 116th Convention of the Audio Engineering Society (May 2004).

[28] B. Grill, O. Hellmuth, J. Hilpert, J. Herre, J. Plogsties, Closing the gap between the multichannel and the stereo audio world: Recent mp3 surround extensions, in: Proc. the 120th Convention of the Audio Engineering Society, Paris, France, 2006.

[29] H. Moon, A low-complexity design for an mp3 multichannel audio decoding system, IEEE Trans. on Audio, Speech, and Lang. Proc. 20 (1) (2012) 314–321.

[30] M. M. Goodwin, J.-M. Jot, A frequency domain framework for spatial audio coding based on universal spatial cues, Presented at the 120th Convention of the Audio Engineering Society (May 2006).

[31] M. M. Goodwin, J.-M. Jot, Analysis and synthesis for universal spatial audio coding, Presented at the 121th Convention of the Audio Engineering Society (Oct. 2006).

[32] J.-M. Jot, J. Merimaa, M. M. Goodwin, A. Krishnaswamy, J. Laroche, Spatial audio coding in a universal two-channel 3D stereo format, in: Proc. 123rd AES Convention, New York, USA, 2007.

[33] M. M. Goodwin, J.-M. Jot, Multichannel surround format conversion and generalized upmix, in: Proc. AES 30th Int. Conf., Saariselka, Finland, 2007.

[34] M. M. Goodwin, J.-M. Jot, Primary-ambient signal decomposition and vector-based localization for spatial audio coding and enhancement, in: Proc. IEEE Intl. Conf. Acoustic, Speech, Signal Processing, Honolulu, Hawaii, USA, 2007.

[35] M. M. Goodwin, J.-M. Jot, Binaural 3-D audio rendering based on spatial audio scene coding, in: Proc. 123rd AES Convention, New York, USA, 2007.

[36] M. M. Goodwin, J.-M. Jot, Binaural 3-D audio rendering based on spatial audio scene coding, in: Proc. 123rd AES Convention, New York, USA, 2007.

[37] M. M. Goodwin, J.-M. Jot, Spatial audio scene coding, Presented at the 125th Convention of the Audio Engineering Society (October 2008).

[38] V. Pulkki, C. Faller, Directional audio coding: Filter bank and STFT-based design, Presented at the 120th Convention of the Audio Engineering Society (May 2006).

[39] V. Pulkki, Directional audio coding in spatial sound reproduction and stereo up-mixing, in: Proc. Audio Engineering Society 28th Intl. Conf., Pitea, Sweden, 2006.

[40] V. Pulkki, M. Karjalainen, Multichannel audio rendering using amplitude panning, IEEE Signal Processing Mag. 25 (3) (2008) 118–122.

[41] J. Vilkamo, T. Lokki, V. Pulkki, Directional Audio Coding: Virtual microphone-based synthesis and subjective evaluation, Journal Audio Eng. Soc. 57 (9) (2009) 709–724.

[42] J. Ahonen, V. Pulkki, T. Lokki, Teleconference application and B-format microphone array for directional audio coding, in: Proc. AES 30th International Conference, Saariselka, Finland, 2007.

[43] V. Pulkki, Virtual sound source positioning using vector based amplitude panning, J. Audio Eng. Soc. 45 (6) (1997) 456–466.

[44] V. Pulkki, M. Karjalainen, V. Vesa, Localization, coloration and enhancement of amplitude-panned virtual sources, in: Proc. AES 16th International Conference, Rovaniemi, Finland, 1999

[45] V. Pulkki, Compensating displacement of amplitude panned virtual sources, in: Proc. AES 22th Int. Conf. on Virtual, Synthetic, and Entertainment Audio, Espoo, Finland, 2002.

[46] V. Pulkki, Compensating displacement of amplitude panned virtual sources, in: Proc. AES 22th Int. Conf. on Virtual, Synthetic, and Entertainment Audio, Espoo, Finland, 2002.

[47] V. Pulkki, T. Hirvonen, Localization of virtual sources in multi-channel audio reproduction, IEEE Transaction on Speech and Audio Processing 13 no. 1 (2005) 105–119.

[48] J. Blauert, Spatial Hearing, The Psychophysics of Human Sound Localization, MIT Press, Cambridge, MA, 2001.

[49] R. S-Amling, F. Kuech, M. Kallinger, G. D. Galdo, J. Ahonen, V. Pulkki, Planar microphone array processing for the analysis and reproduction of spatial audio using directional audio coding, Presented at the 124th Convention of the Audio Engineering Society (May 2008).

[50] F. Kuech, M. Kallinger, R. S-Amling, G. del Galdo, J. Ahonen, V. Pulkki, Directional audio coding using planar microphone arrays, in: Proc. Hands-free Speech Communication and Microphone Arrays (HSCMA), Trento, Italy, 2008.

[51] M. Kallinger, F. Kuech, R. S-Amling, G. del Galdo, J. Ahonen, V. Pulkki, Enhanced direction estimation using microphone arrays for directional audio coding, in: Proc. Hands-free Speech Communication and Microphone Arrays (HSCMA), Trento, Italy, 2008.

[52] J. Ahonen, V. Pulkki, F. Kuech, M. Kallinger, R. S-Amling, Directional analysis of sound field with linear microphone array and applications in sound reproduction, in: Proc. 124th AES Convention, Amsterdam, The Netherlands, 2008.

[53] B. Cheng, C. Ritz, I. Burnett, Advances in Multimedia Information Processing 2006, Springer, Berlin, Heidelberg, 2006, Ch. Squeezing the Auditory Space: A New Approach to Multichannel Audio Coding, pp. 572–581.

[54] B. Cheng, C. Ritz, I. Burnett, Principles and analysis of the squeezing approach to low bit rate spatial audio coding, in: Proc. IEEE Intl. Conf. Acoustic, Speech, Signal Processing, Honolulu, Hawaii, USA, 2007.

[55] B. Cheng, C. Ritz, I. Burnett, A spatial squeezing approach to Ambisonic audio compression, in: Proc. IEEE Intl. Conf. Acoust. Speech, Signal Process., Las Vegas, Nevada, USA, 2008

[56] E. Cheng, B. Cheng, C. Ritz, I. Burnett, Spatialized teleconferencing: Recording and squeezed rendering of multiple distributed sites, in: Proc. Australian Telecom. Network and Appl. Conf., Adelaide, Australia, 2008.

[57] I. Elfitri, B. Gunel, A. M. Kondoz, Multichannel audio coding based on analysis by synthesis, Proc. of the IEEE 99 (4) (2011) 657–670.

[58] I. Elfitri, R. Kurnia, Fitrilina, Investigation on objective performance of closed-loop spatial audio coding, in: Proc. of 2014 Int. Conf. on Information Tech. and Electrical Eng., Jogjakarta, Indonesia, 2014.

[59] I. Elfitri, R. Kurnia, D. Harneldi, Experimental study on im-645 proved parametric stereo for bit rate scalable audio coding, in: Proc. of 2014 Int. Conf. on Information Tech. and Electrical Eng., Jogjakarta, Indonesia, 2014.

[60] I. Elfitri, M. Muharam, M. Shobirin, Distortion analysis of hierarchical mixing technique on MPEG surround standard, in: Proc. of 2014 Int. Conf. on Advanced Computer Sciences and Information System, Jakarta, Indonesia, 2014.

[61] I. Elfitri, X. Shi, A. M. Kondoz, Analysis by synthesis spatial audio coding, IET Signal Processing 8 (1) (2014) 30–38.

[62] P. Eisert, Model-based camera calibration using analysis by655 synthesis techniques, in: Proc. of vision, modeling, and visualization, Erlangen, Germany, 2002.

[63] ITU-R, Method for Subjective Assessment of Small Impairments in Audio Systems Including Multichannel Sound Systems, Recommendation ITU-R BS.1116-1 (1997).

[64] B. Cheng, C. Ritz, I. Burnett, Advances in Multimedia Information Processing-PCM 2007, Springer, Berlin, Heidelberg, 2007, Ch. Encoding Independent Sources in Spatially Squeezed Surround Audio Coding, pp. 804–813.

[65] C. Liu, H. Hsu, W. Lee, Compression artifacts in perceptual audio coding, IEEE Trans. on Audio, Speech, and Lang. Proc. 16 (4) (2008) 681–695.

[66] ITU-R, Method for Objective Measurements of Perceived Audio Quality, Recommendation ITU-R BS.1387-1 (2001).

[67] S. Quackenbush, J. Herre, MPEG Surround, IEEE Trans. On Multimedia 12 (4) (2005) 18–23.

[68] J. Breebaart, J. Herre, C. Faller, J. Roden, F. Myburg, S. Disch, H. Purnhagen, H. G, M. Neusinger, K. Kjorling, W. Oomen, MPEG spatial audio coding/ MPEG surround: Overview and current status, Presented at the 11th Convention of the Audio Engineering Society (October 2005).

[69] J. Breebaart, G. Hotho, J. Koppens, E. Schuijers, W. Oomen, S. V. de Par, Background, concepts, and architecture for the recent MPEG Surround standard on multichannel audio compression, J. Audio Eng. Soc. 55 (2007) 331–351.

[70] J. Herre, H. Purnhagen, J. Breebaart, C. Faller, S. Disch K. Kjorling, E. Schuijers, J. Hilpert, F. Myburg, The reference model architecture for MPEG spatial audio coding, Presented at the 118th Convention of the Audio Engineering So ciety (May 2005).

[71] J. Herre, et al., MPEG Surround - The ISO/MPEG standard for efficient and compatible multichannel audio coding, J. Au- dio Eng. Soc. 56 (11) (2008) 932–955.

[72] J. Hilpert and S. Disch, “The MPEG Surround audio coding standard [Standards in a nutshell],” IEEE Signal Processing Mag., vol. 26, no. 1, pp. 148–152, Jan. 2009.

[73] ISO/IEC, “Information Technology - MPEG Audio Technologies, Part 1: MPEG Surround,” ISO/IEC 23003-1:2007(E), International Standards Organization, Geneva, Switzerland, 2007.

[74] S. Samsudin, E. Kurniawati, and S. George, “A direct MPEG surround encoding scheme for surround sound recording with coincidence microphone techniques,” in AES 55th International Conference, Helsinki, Finland, August 2014.

[75] C. Tournery, C. Faller, F. Kuech, and J. Herre, “Converting stereo microphone signals directly to MPEG surround,” in Proc. 128th AES Convention, London, UK, May 2010.

[76] J. Engdegard et al., “Spatial audio object coding (SAOC)-The upcoming MPEG standard on parametric object based audio coding,” Presented at the 124th Convention of the Audio Engineering Society, Amsterdam, The Netherlands, May 2008.

[77] L. Terentiev, C. Falch, O. Hellmuth, J. Hilpert, W. Oomen, J. Engdegard, and H. Mundt, “SAOC for gaming-the upcoming MPEG standard on parametric object based audio coding,” in Proc. AES 35th Int. Conference, London, UK, Feb. 2009.

[78] J. Herre, J. Hilpert, A. Kuntz, and J. Plogsties, “MPEG-H Audio The new standard for universal spatial/3d audio coding,” J. Audio Eng. Soc., vol. 62, no. 12, pp. 821–830, 2015.

[79] J. Engdegard, H. Purnhagen, J. Roden, and L. Liljeryd, “Synthetic ambience in parametric stereo coding,” Presented at the 116th Convention of the Audio Engineering Society, Berlin, Germany, May 2004.

[80] D. P. Chen, H. F. Hsiao, H. W. Hsu, and C. M. Liu, “Gram-schmidt-based downmixer and decorrelator in the MPEG surround coding,” Presented at the 128th Convention of the Audio Engineering Society, London, UK, May, 2010.

[81] M. Wolters, K. Kjorling, D. Homm, and H. Purnhagen, “A closer look into MPEG-4 high efficiency AAC,” in Proc. the 115th Convention of the Audio Engineering Society, New York, USA, October 2003.

[82] J. Herre and M. Dietz, “MPEG-4 high-efficiency AAC coding,” IEEE Signal Proc. Mag., vol. 25, no. 3, pp. 137–142, 2008.

[83] A. Mason, D. Marston, F. Kozamernik, and G. Stoll, “EBU test of multichannel audio codecs,” Presented at the 122nd Convention of the Audio Engineering Society, Vienna, Austria, May, 2007.

[84] D. Marston, F. Kozamernik, G. Stoll, and G. Spikofski, “Further EBU test of multichannel audio codecs,” Presented at the 126th Convention of the Audio Engineering Society, Munich, Germany, May, 2009.

[85] J. Roden, J. Breebart, J. Hilpert, H. Purnhagen, E. Schuijers, J. Koppens, K. Linzmeier, and A. Holzer, “A study of the MPEG Surround quality versus bit-rate curve,” Presented at the 123rd Convention of the Audio Engineering Society, New York, USA, Oct. 2007.

[86] J. Breebaart, J. Herre, L. Villemoes, C. Jin, K. Kjorling, J. Plogsties, and J. Koppens, “Multi-channel goes mobile: MPEG Surround binaural rendering,” in Proc. the AES 29th Int. Conference, Seoul, Korea, September 2006.

[87] J. Herre and S. Disch, “New concepts in parametric coding of spatial audio: From SAC to SAOC,” in Proc. IEEE Int. Conf. on Multimedia and Expo, San Fransisco, CA, USA, Oct. 2007.

[88] J. Herre and L. Terentiv, “Parametric coding of audio objects: Technology, performance, and opportunities,” Presented at the 42nd Int. Conference: Semantic Audio, Ilmenau, Germany, July 2011.

[89] S. Gorlow, E. A. P. Habets, and S. Marchand, “Multichannel object-based audio coding with controllable quality,” in Proc. 2013 IEEE Int. Conf. Acoustics, Speech and Signal Proc., Vancouver, Canada, June 2013.

[90] S. Fug, A. Holzer, C. Borb, C. Ertel, M. Kratschmer, and J. Plogsties, “Design, coding, and processing of metadata for object-based interactive audio,” in Proc. 137th AES Convention, Los Angeles, USA, Oct. 2014.

[91] J. Herre, C. Falch, D. Mahne, G. del Galdo, M. Kallinger, and O. Thiergart, “Interactive teleconferencing combining spatial audio object coding and DirAC technology,” Presented at the 128th Convention of the Audio Engineering Society, London, UK, May 2010.

[92] K. Kim, J. Seo, S. Beack, K. Kang, and M. Hahn, “Spatial audio object coding with two-step coding structure for interactive audio service,” IEEE Trans. on Multimedia, vol. 13, no. 6, pp. 1208–1216, December 2011.

[93] S. Kim, Y. Lee, and V. Pulkki, “New 10.2-channel vertical surround system (10.2-vss); comparison study of perceived audio quality in various multichannel sound systems with height loudspeakers,” Presented at the 129th Convention of the Audio Engineering Society, San Fransisco, USA, November 2010.

[94] K. Hamasaki, T. Nishiguchi, R. Okumura, and Y. Nakayama, “Wide listening area with exceptional spatial sound quality of a 22.2 multichannel sound system,” Presented at the 122nd Convention of the Audio Engineering Society, Vienna, Austria, May, 2007.

[95] T. Sugimoto, Y. Nakayama, and S. Oode, “Bitrate of 22.2 multichannel sound signal meeting broadcast quality,” in Proc. 137th AES Convention, Los Angeles, USA, Oct. 2014.

[96] T. Nishiguchi and et al, “Production and live transmission of 22.2 multichannel sound with ultrahigh-definition tv,” in Proc. the 122nd AES Convention, Vienna, Austria, May 2007.

[97] K. Matsui and A. Ando, “Binaural reproduction of 22.2 multichannel sound with loudspeaker array frame,” in Proc. the 135th AES Convention, New York, USA, Oct. 2013.

[98] K. Hamasaki, “The 22.2 multichannel sounds and its reproduction at home and personal environment,” in AES 43rd International Conference, Pohang, Korea, Sep. 2011.

[99] A. J. Berkhout, D. de Vries, and P. Vogel, “Acoustic control by wave field synthesis,” J. Acoust. Soc. Am., vol. 93, no. 5, pp. 2764–2778, May 1993.

[100] G. Theile, “Wave filed synthesis-a promising spatial audio rendering concept,” in Proc. of the 7th Int. Conf. on Digital Audio Effects (DAFX’04), Naples, Italy, Oct. 2004.

[101] J.-M. Valin, T. B. T. C. Montgomery, and G. Maxwell, “A high-quality speech and audio codec with less than 10-ms delay,” IEEE Trans. on Audio, Speech, and Language Proc., vol. 18, no. 1, pp. 58–67, January 2010.

[102] R. Chivukula, Y. Reznik, Y. Hu, V. Devarajan, and M. Lakshman, “Fast algorithms for low-delay TDAC filterbanks in MPEG-4 AAC- ELD,” IEEE/ACM Trans. on Audio, Speech and Language Processing, vol. 22, no. 12, pp. 1701–1712, 2014.

[103] R. V. Cox, D. C. Neto, C. Lamblin, and M. H. Sherif, “ITU-T coders for wideband, superwideband, and fullband speech communication,” IEEE Communications Magazine, vol. 47, no. 10, pp. 106–109, October 2009.

[104] V. Eksler and M. Jelinek, “Coding of unquantized spectrum sub-bands in superwideband audio codecs,” in Proc. IEEE Intl. Conf. Acoust. Speech, Signal Processing Prague, Czech Republic, May 2011.

[105] B. Geiser et al., “Candidate proposal for ITU-T super-wideband speech and audio coding,” in Proc. IEEE Intl. Conf. Acoust. Speech, Signal Processing Taipei, Taiwan, Apr. 2009.

[106] Y. Hiwasaki et al., “G.711.1: A wideband extension to ITU-T G.711,” in Proc. 16th European Signal Processing Conference (EUSIPCO-2008), Lausanne, Switzerland, Aug. 2008.

[107] B. Kovesi, S. Ragot, C. Lamblin, L. Miao, Z. Liu, and C. Hu, “Re-engineering ITU-T G.722: Low delay and complexity superwideband coding at 64 kbit/s with G.722 bitstream watermarking,” in Proc. IEEE Intl. Conf. Acoust. Speech, Signal Processing Prague, Czech Republic, May 2011.

[108] S. Ragot et al., “ITU-T G.729.1: An 8-32 kbit/s scalable coder interoperable with G.729 for wideband telephony and voice over IP,” in Proc. IEEE Intl. Conf. Acoust. Speech, Signal Processing Honolulu, Hawaii, USA, Apr. 2007.

Most read articles by the same author(s)