Now showing 1 - 10 of 51
  • Publication
    Open Access
    Spatial Audio Through Headphones Based on HRTFs Approximated by Parametric IIR Filters
    (Universitätsbibliothek der HSU / UniBwH, 2022-06) ; ;
    Helmut-Schmidt-Universität / Universität der Bundeswehr Hamburg
    ;
    The subject of this dissertation is spatial audio through headphones. In the present work, an offline binaural synthesis implementation is proposed using head-related transfer functions (HRTFs) approximated by cascades of parametric infinite impulse response (IIR) filters, parameter interpolation to calculate HRTFs of intermediate directions for generating static as well as moving virtual sound sources, and simulated room effects in order to increase the perceived externalization. The first contribution to the research field lies in representing HRTFs as cascades of low-order parametric IIR filters together with a delay representing the interaural time difference (ITD). Usually, HRTFs are represented as finite impulse response (FIR) filters containing the corresponding head-related impulse responses (HRIRs) as filter coefficients. However, by using cascades of low-order parametric IIR filters, like first-order shelving or second-order peak filters, memory requirements of the used hardware can be decreased to three parameters per filter stage (cut-off or center frequency, gain, and Q-factor). For this purpose, a two-step procedure is proposed that approximates the magnitude responses of HRTFs by parametric IIR filter cascades. In a first step, the individual filter stages are consecutively integrated, initialized, and tuned. Afterwards, the interaction between individual filter stages is post-optimized. Alternatively, an approach for HRTF magnitude response approximation based on instantaneous backpropagation is proposed. After approximating the HRTF magnitude responses, also the ITDs have to be extracted from the HRIRs or HRTFs of the two ears. From this, virtual sound sources are generated by filtering a monaural audio signal with the parametric IIR filter cascades of the desired direction and delaying the filtered audio signal of the contralateral ear by the extracted ITD. In many practical implementations, only a finite number of measured HRTFs is available, resulting in a limited spatial resolution. For HRTFs represented as FIR filters, bilinear rectangular or triangular interpolation can be used to calculate the filter coefficients of intermediate HRTFs. However, when the HRTFs are represented as IIR filters instead, the interpolation is not as straightforward as for FIR filters due to stability considerations. Therefore, in this work, a parameter interpolation algorithm based on bilinear interpolation of the parameters of the individual filter stages together with an assignment of related peak filters is proposed. This interpolation algorithm guarantees the stability of intermediate filters. When generating moving virtual sound sources, two IIR filter cascades are combined in parallel following the cross-fading input-switching combination approach. For evaluating the proposed methods, three listening tests assessing different aspects of binaural synthesis using HRTFs approximated by parametric IIR filters are performed. In a first listening test, the validity of the proposed parametric IIR filter cascades is proven for static virtual sound sources by comparing their localization results to localization results achieved using HRIRs represented as FIR filters. Additionally, a second listening test proves that adding simulated room effects via the image source model increases the perceived externalization of static virtual sound sources generated using HRTFs approximated by parametric IIR filter cascades up to externalization levels achieved using measured binaural room impulse responses represented as FIR filters. Finally, the audio quality of moving virtual sound sources generated using minimum-phase approximated HRIRs represented as FIR filters and parametric IIR filter cascades is evaluated in a third listening test. By using two IIR filters in parallel following the cross-fading input-switching combination approach, comparable audio quality ratings are achieved as for FIR filter implementations using minimum-phase approximated HRIRs. Thus, HRTFs approximated by parametric IIR filter cascades can be used to reduce the number of saved coefficients. By using two first-order shelving filters, ten second-order peak filters, a mean HRTF magnitude value, and an extracted ITD, only 36 parameters have to be saved per HRTF instead of 200 coefficients as in FIR filter implementations using conventional HRIRs.
  • Publication
    Open Access
    Deep Learning for Image Enhancement
    (Universitätsbibliothek der HSU / UniBwH, 2022-05) ; ;
    Helmut-Schmidt-Universität / Universität der Bundeswehr Hamburg
    ;
    Deep learning belongs to the family of artificial intelligence and machine learning where the primary objective is to learn and diversify the feature representation for a given system. In deep learning, a machine is able to develop large parameterized models that addresses a plethora of scientific problems based on a number of optimization methods. These models will be capable of retrieving, representing, generating, and combining a large number of features to provide a generalized solution to the intended problems. Unlike traditional machine learning algorithms, deep learning algorithms offer an opportunity to learn, extract, and even generate very large feature spaces via densely parameterized models, which are capable of learning semantic information and an efficient input-output mapping. Hence, they are very suitable in low- level computer vision applications involving multimedia enhancement problems. Deep learning has a very broad scope, but this thesis is primarily focused on artificial neural networks, convolutional neural networks, and their variants which are some of the most powerful deep learning tools today. In this work, the neural network fundamentals are explained, the corresponding derivations are performed, and the workflows are illustrated. Important modules of convolutional neural networks are described and their functions are discussed. Various convolutional architectures are proposed for various computer vision tasks related to image quality improvement and their suitability towards the particular problems are explained. Various networks, which include novel network modules and architectures, are studied and applied in the areas of image and video enhancement. Ablation studies and experiments are performed on the network architectures to analyze them. Finally, the proposed models are evaluated in terms of their prowess towards the aforementioned vision tasks.
  • Publication
    Open Access
    Stereo Signal Decomposition and Upmixing to Surround and 3D Audio
    (Universitätsbibliothek der HSU / UniBwH, 2022)
    Kraft, Sebastian
    ;
    ;
    Helmut-Schmidt-Universität / Universität der Bundeswehr Hamburg
    ;
    Spors, Sascha
    The invention of spatial audio playback techniques dates back to the 1930s and today 3D playback systems are commonly available in cinemas, event venues and even in the consumer’s home cinema or car audio system. While the movie industry quickly adopted new formats to support multi-channel loudspeaker installations, music has been produced and distributed almost exclusively as two-channel stereo for decades and still is today. To gain benefit from multi-channel spatial audio playback systems, for example an enhanced envelopment of the listener and a more precise localisation of sound sources, it is necessary to remix or upmix such legacy content. The building of an automatic upmix processor is described in the course of this thesis and involves three main components that have to be developed and integrated: time-frequency transformation, spatial signal analysis and decomposition as well as decorrelation and repanning strategies under consideration of the target loudspeaker configuration. Two approaches to transform a signal into a time-frequency representation are compared initially. The short-time Fourier transform is easy to implement due to widely-available and optimised FFT libraries. But it also introduces a processing latency of one sample block and is limited to linearly-spaced subbands. Furthermore, an alternative solution based on a filter bank is described. It offers a flexible configuration of subbands and can achieve a group delay that allows for real-time applications. A novel method is proposed to decompose the time-frequency domain signal by its spatial characteristics into an amplitude-panned direct and an uncorrelated ambient component. The direct signal source directions are estimated from the stereo input signal power and afterwards steer the decomposition into the direct and ambient part. Different to similar algorithms which frequently suffer from out-of-phase ambient signals, the ambient signal phase relationship is a free parameter and can be adjusted to achieve an optimal left and right ambient signal correlation close to zero. The separated ambient signal is further decorrelated into a sufficient amount of channels to feed all available loudspeakers and yield a diffuse and three-dimensional ambient sound field. A frequency domain decorrelation strategy is introduced for that purpose where the signal is processed with a tree of magnitude-complementary filters. By inclusion of vector base amplitude panning into the framework, the separated direct signal component can be repanned easily on varying target loudspeaker configurations. The complete upmix processor has been implemented in a real-time environment and the computational demands are analysed and discussed. A considerable performance gain could be achieved after embedding a highly-optimised FFT library and by making use of manual or automatic vectorisation. The upmix received positive feedback in frequent informal listening sessions and in particular the naturalness of the source rendering and the clean and diffuse ambient sound field was rated positively. A simplified low-cost variant of the upmix processor is finally described and makes use of a filter bank for the time-frequency transform. Already a few logarithmically-spaced subbands turned out to be sufficient to yield a decent separation of overlapping sound sources. The filter bank implementation does not offer the same sonic possibilities compared to using the STFT but is computationally more efficient and due to sample-by-sample processing the latency and memory consumption is reduced significantly.
  • Publication
    Open Access
    Beiträge zur Digitalisierung eines Hochfrequenz-Kondensatormikrofons: Von der Fakultät für Elektrotechnik der Helmut-Schmidt-Universität / Universität der Bundeswehr Hamburg zur Erlangung des akademischen Grades eines Doktor-Ingenieurs genehmigte Dissertation
    (Universitätsbibliothek der HSU / UniBwH, 2020)
    Urbansky, Lars
    ;
    ; ;
    Helmut-Schmidt-Universität / Universität der Bundeswehr Hamburg
    Das Einsatzgebiet von Mikrofonen ist sehr vielseitig. Je nach Anwendungsfall werden dafür verschiedenste Baugrößen mit unterschiedlichsten Techniken zur Schallwandlung verwendet. Ein Wandlertyp ist das Kondensatormikrofon. Dabei kann es elektronisch entweder in Niederfrequenz- oder Hochfrequenz-Schaltung betrieben werden, wobei die Hochfrequenz-Schaltung Vorteile bietet. Allerdings ist eine solche Implementierung komplexer. Beide Schaltungsvarianten sind zwar kommerziell verfügbar, allerdings ist das Portfolio an Hochfrequenz-Kondensatormikrofonen klein. Des Weiteren sind die Hochfrequenz-Schaltungen bislang gänzlich analog realisiert. Der heutige Trend zeigt jedoch eine zunehmende Digitalisierung analoger Schaltungen. In dieser Arbeit werden Beiträge für ein digitalisiertes Hochfrequenz-Kondensatormikrofon vorgestellt. Die zugrunde liegende Schallwandlung selbst ist jedoch aufgrund des Wandlungsprinzips (ein zeitvarianter Kondensator) weiterhin analog realisiert. Die für den Betrieb zusätzlich benötigten Komponenten sind digitalisiert. Zunächst wird ein digitales Trägersignal erzeugt und analog gewandelt. Das Trägersignal wird in eine analoge Messbrückenschaltung eingespeist. Die in der Messbrücke verschaltete zeitvariante Kapazität moduliert das Trägersignal in der Amplitude. Das generierte amplitudenmodulierte Signal wird digital gewandelt und einem digitalen kohärenten Demodulator zugeführt. Nach zusätzlicher digitaler Signalverarbeitung liegt ein digitales Audiosignal vor. Aufgrund der digitalen Demodulation ergibt sich stets ein analoges Bandpasssignal, welches frei von elektrischen tieffrequenten Rauscheinflüssen ist. Des Weiteren kann die Sensitivität des Mikrofons durch Anpassung der Amplitude des Trägersignals direkt digital gesteuert werden, was schließlich einen rein digitalen Ansatz zur automatischen Dynamikerweiterung des Systems ermöglicht. Das Gesamtsystem dieses neuartigen Hochfrequenz-Kondensatormikrofons mit digitaler Signalverarbeitung erreicht einen höheren Signal-Rausch-Abstand, als es der aktuelle Stand der Technik hergibt. Zudem zeigt die automatische Dynamikerweiterung, dass der Dynamikumfang des Systems erhöht werden kann.
  • Publication
    Open Access
    System Identification of Nonlinear Audio Circuits: Von der Fakultät für Elektrotechnik der Helmut-Schmidt-Universität/Universität der Bundeswehr Hamburg zur Erlangung des akademischen Grades eines Doktor-Ingenieurs genehmigte Dissertation vorgelegt von
    (Universitätsbibliothek der HSU / UniBwH, 2020)
    Eichas, Felix
    ;
    ;
    Helmut-Schmidt-Universität / Universität der Bundeswehr Hamburg
    ;
    Välimäki, Vesa
    Digital systems gain more and more popularity in todays music industry. Musicians and producers are using digital systems because of their advantages over analog electronics. They require less physical space, are cheaper to produce and are not prone to aging circuit components or temperature variations. Furthermore, they always produce the same output signal for a defined input sequence. However, musicians like vintage equipment. Old guitar amplifiers or legendary recording equipment are sold at very high prices. Therefore, it is desirable to create digital models of analog music electronics which can be used in modern digital environments. This work presents an approach for recreating nonlinear audio circuits using system identification techniques. Measurements of the input- and output-signals from the analog reference devices are used to adjust a digital model treating the reference device as a ‘black-box’. With this technique the schematic of the reference device does not need to be known and no circuit elements have to be measured to recreate the analog device. An appropriate block-based model is chosen, depending on the type of reference system. Then the parameters of the digital model are adjusted with an optimization method according to the measured input- and output-signals. The performance of the optimized digital model is evaluated with objective scores and listening tests. Two types of nonlinear reference systems are examined in this work. The first type of reference systems are dynamic range compressors like the ‘MXR Dynacomp’, the ‘Aguilar TLC’, or the ‘UREI 1176LN’. A blockbased model describing a generic dynamic range compression system is chosen and an automated routine is developed to adjust it. The adapted digital models are evaluated with objective scores and a listening test is performed for the UREI 1176LN studio compressor. The second type of nonlinear systems are distortion systems like e.g. amplifiers for electric guitars. This work presents novel modeling approaches for different kinds of distortion systems from basic distortion circuits which can be found in distortion pedals for guitars to (vintage) guitar amplifiers like the ‘Marshall JCM900’, or the ‘Fender Bassman’. The linear blocks of the digital model are measured and used in the model while the nonlinear blocks are adapted with parameter optimization methods like the Levenberg–Marquardt method. The quality of the adjusted models is evaluated with objective scores and listening tests. The adjusted digital models give convincing results and can be implemented as real-time digital versions of their analog counterparts. This enables the musician to safe a snapshot of a certain sound and recall it anytime with a digital system like a VST plug-in or as a program on a dedicated hardware.
  • Publication
    Open Access
    Hybrid and Pseudo-Cascaded Active Noise Control Applied to Headphones
    (Universitätsbibliothek der HSU / UniBwH, 2020)
    Rivera Benois, Piero Iared
    ;
    ;
    Helmut-Schmidt-Universität / Universität der Bundeswehr Hamburg
    ;
    The subject of this dissertation is the active control of acoustical noise by means of headphones. The contribution to the research field lies on three novel control structures and the optimization of their parameters. These control structures combine simultaneously the three classical control schemes, namely the feedforward, the minimum variance and the internal model control schemes, into one system. This without requiring additional microphones or loudspeakers. The optimization of the controller parameters is achieved in two stages. First, the minimum variance and the internal model controllers are co-optimized subject to the stability, performance, and controller gain constraints developed in this work. Second, based on their parameters and the novel control structure to be used, the feedforward controller is optimized. This can be done once for a fixed controller implementation, by following a Wiener controller derivation. Alternatively, the optimization can be done continuously over time based on the implementation of an adaptive controller. For achieving this goal the Modified Normalized Filtered-x LMS algorithm is integrated together with the novel control structures, such that a minimum of memory and computational resources is required. The fixed controllers are evaluated by means of simulations of an ANC headphones prototype subject to an ipsilateral free-field excitation. From the results it is concluded that, if the impulse response of the feedforward controller is as long as the one of the primary path, then the performance of the novel control structure is the same as the one of a classical feedforward scheme. However, if the impulse response of the feedforward controller is shorter than the one of the primary path, then the minimum variance and internal model controllers effectively extend its impulse response, such that it approximates the one of a longer controller. Thus, the novel control structures achieve a better performance. The adaptive feedforward controller is evaluated by means of an ANC Headphones prototype under ipsilateral and contralateral stochastic noise excitation. It is found that, under ipsilateral excitation the results achieved in the simulations could be corroborated. However, it is also found that the adaptation algorithms of the control structures are subject to some deterioration, if the impulse response of the modelled systems are not sufficiently long. Under a contralateral excitation the novel control structures showed to be subject to a dominant additive noise introduced by the feedforward controller. Nevertheless, it is also found that the performance of the minimum variance and internal model controllers combined together is resilient to the contralateral excitation. Hence, if the problem of the feedforward controller can be solved, the increase in performance of the novel structures can be achieved also under these circumstances.
  • Publication
    Open Access
    Enhancements for Networked Music Performances
    (Universitätsbibliothek der HSU / UniBwH, 2018)
    Fink, Marco
    ;
    ;
    Helmut-Schmidt-Universität / Universität der Bundeswehr Hamburg
    The availability and capability of today’s internet allow several novel challenging interactive multimedia applications like Networked Music Performances (NMP). A Networked Music Performance is an online artistic collaboration with musicians located at different geographic locations connected using the internet. While offering manifold artistic possibilities, many technical challenges like the resulting latency and the possibility of packet loss have to be considered. This work depicts three enhancements for NMP applications which improve error robustness, the algorithmic delay, and the spatial listening experience, respectively. To counteract the possibility of packet loss or network jitter-caused tardy arrival of packets, this work derives two methods to conceal errors during audio replay at the receiver side. The first, auto-regressive model-based variant facilitates concealing the audible impact of missing packets with high quality but is computationally expensive. Several ways of computing the auto-regressive model are presented and compared. The second method, based on wave-form substitution, constitutes an efficient, cheap alternative. The proposed methods are evaluated subjectively with a listening test and objectively with measurements of perceptual quality. The application of audio codecs in NMP sessions is inevitable in most scenarios due to the restricted data rate and in particular the upload rate of private internet accesses. Besides reducing the data rate the codec must feature a small algorithmic latency to restrict the overall latency to a certain extent. A novel audio coding approach which features smaller delays than widely used low-delay codecs and a clearly reduced data rate in contrast to delay-less codecs is presented. It is constructed using the Adaptive Differential Pulse Code Modulation (ADPCM) codec approach in subbands in combination with a Vector Quantizer (VQ) resulting in the Vector-Quantized Adaptive Differential Pulse Code Modulation (VQ-ADPCM) codec. The proposed codec is capable of encoding broadband audio with a data rate of 64 kbit/s and algorithmic delay of about 1 ms. The perceptual quality is compared to well-known codecs using perceptually motivated measurements. The last contribution is intended to improve the acoustic spatial scenery within a NMP. For this purpose, a pseudo stereo conversion method providing a broad stereo panorama for single channel sound sources is derived. The method enhances the spaciousness of the stereo mix at the receiver without adding timbral coloration or reverberation and therefore offers an improved listening experience for NMP participants. The proposed method is based on the design of a complementary filter pair, which can be applied in time- and frequency-domain. Additionally, the integration within a virtual surround mixer based on Head-Related Impulse Responses (HRIRs) is demonstrated. Virtual surround mixing allows the arbitrary positioning of several sound sources in a virtual room. The extension with the proposed pseudo-stereo approach even facilitates to define sound sources of a certain size instead of single point sources. The three proposed enhancements are purely based on digital signal processing and therefore can be implemented in the software layer of any NMP system without demanding any changes to the actual musical performance, the utilized hardware, or the available network structure.
  • Publication
    Metadata only
  • Publication
    Metadata only
    Changes in room acoustics elicit a mismatch negativity in the absence of overall interaural intensity differences
    (2017)
    Frey, Johannes D.
    ;
    Wendt, Mike
    ;
    ;
    Möller, Stephan
    ;
    ;
    Changes in room acoustics provide important clues about the environment of sound source-perceiver systems, for example, by indicating changes in the reflecting characteristics of surrounding objects. To study the detection of auditory irregularities brought about by a change in room acoustics, a passive oddball protocol with participants watching a movie was applied in this study. Acoustic stimuli were presented via headphones. Standards and deviants were created by modelling rooms of different sizes, keeping the values of the basic acoustic dimensions (e.g., frequency, duration, sound pressure, and sound source location) as constant as possible. In the first experiment, each standard and deviant stimulus consisted of sequences of three short sounds derived from sinusoidal tones, resulting in three onsets during each stimulus. Deviant stimuli elicited a Mismatch Negativity (MMN) as well as two additional negative deflections corresponding to the three onset peaks. In the second experiment, only one sound was used; the stimuli were otherwise identical to the ones used in the first experiment. Again, an MMN was observed, followed by an additional negative deflection. These results provide further support for the hypothesis of automatic detection of unattended changes in room acoustics, extending previous work by demonstrating the elicitation of an MMN by changes in room acoustics.