Stereo Signal Decomposition and Upmixing to Surround and 3D Audio
Publication date
2022
Document type
PhD thesis (dissertation)
Author
Kraft, Sebastian
Advisor
Referee
Spors, Sascha
Granting institution
Helmut-Schmidt-Universität / Universität der Bundeswehr Hamburg
Exam date
2022-04-29
Organisational unit
Part of the university bibliography
✅
DDC Class
600 Technik
Keyword
Time-frequency
Abstract
The invention of spatial audio playback techniques dates back to the 1930s and today 3D playback systems are commonly available in cinemas, event venues and even in the consumer’s home cinema or car audio system. While the movie industry quickly adopted new formats to support multi-channel loudspeaker installations, music has been produced and distributed almost exclusively as two-channel stereo for decades and still is today. To gain benefit from multi-channel spatial audio playback systems, for example an enhanced envelopment of the listener and a more precise localisation of sound sources, it is necessary to remix or upmix such legacy content. The building of an automatic upmix processor is described in the course of this thesis and involves three main components that have to be developed and integrated: time-frequency transformation, spatial signal analysis and decomposition as well as decorrelation and repanning strategies under consideration of the target loudspeaker configuration. Two approaches to transform a signal into a time-frequency representation are compared initially. The short-time Fourier transform is easy to implement due to widely-available and optimised FFT libraries. But it also introduces a processing latency of one sample block and is limited to linearly-spaced subbands. Furthermore, an alternative solution based on a filter bank is described. It offers a flexible configuration of subbands and can achieve a group delay that allows for real-time applications. A novel method is proposed to decompose the time-frequency domain signal by its spatial characteristics into an amplitude-panned direct and an uncorrelated ambient component. The direct signal source directions are estimated from the stereo input signal power and afterwards steer the decomposition into the direct and ambient part. Different to similar algorithms which frequently suffer from out-of-phase ambient signals, the ambient signal phase relationship is a free parameter and can be adjusted to achieve an optimal left and right ambient signal correlation close to zero. The separated ambient signal is further decorrelated into a sufficient amount of channels to feed all available loudspeakers and yield a diffuse and three-dimensional ambient sound field. A frequency domain decorrelation strategy is introduced for that purpose where the signal is processed with a tree of magnitude-complementary filters. By inclusion of vector base amplitude panning into the framework, the separated direct signal component can be repanned easily on varying target loudspeaker configurations. The complete upmix processor has been implemented in a real-time environment and the computational demands are analysed and discussed. A considerable performance gain could be achieved after embedding a highly-optimised FFT library and by making use of manual or automatic vectorisation. The upmix received positive feedback in frequent informal listening sessions and in particular the naturalness of the source rendering and the clean and diffuse ambient sound field was rated positively. A simplified low-cost variant of the upmix processor is finally described and makes use of a filter bank for the time-frequency transform. Already a few logarithmically-spaced subbands turned out to be sufficient to yield a decent separation of overlapping sound sources. The filter bank implementation does not offer the same sonic possibilities compared to using the STFT but is computationally more efficient and due to sample-by-sample processing the latency and memory consumption is reduced significantly.
Version
Not applicable (or unknown)
Access right on openHSU
Open access