RT info:eu-repo/semantics/masterThesis
T1 Design, implementation and evaluation of an acoustic source localization system using Deep Learning techniques
A1 Vera Díaz, Juan Manuel
K1 Acoustic source localization
K1 Microphone arrays
K1 Deep Learning
K1 CNN (Convolutional Neural Network)
K1 Telecomunicaciones
K1 Telecommunication
AB This Master Thesis presents a novel approach for indoor acoustic source localization using microphonearrays, based on a Convolutional Neural Network (CNN) that we call the ASLNet. It directly estimatesthe three-dimensional position of a single acoustic source using as inputs the raw audio signals from a setof microphones. We use supervised learning methods to train our network end-to-end. The amount oflabeled training data available for this problem is however small. This Thesis presents a training strategybased on two steps that mitigates this problem. We first train our network using semi-synthetic datagenerated from close talk speech recordings and a mathematical model for signal propagation from thesource to the microphones. The amount of semi-synthetic data can be virtually as large as needed. Wethen fine tune the resulting network using a small amount of real data. Our experimental results, evaluatedon a publicly available dataset recorded in a real room, show that this approach is able to improve existinglocalization methods based on SRP-PHAT strategies and also those presented in very recent proposalsbased on Convolutional Recurrent Neural Networks (CRNN). In addition, our experiments show that theperformance of the ASLNet does not show a relevant dependency on the speaker’s gender, nor on thesize of the signal window being used. This work also investigates methods to improve the generalizationproperties of our network using only semi-synthetic data for training. This is a highly important objectivedue to the cost of labelling localization data. We proceed by including specific effects in the input signalsto force the network to be insensitive to multipath, high noise and distortion likely to be present in realscenarios. We obtain promising results with this strategy although they still lack behind strategies basedon fine-tuning.
YR 2019
FD 2019
LK http://hdl.handle.net/10017/38642
UL http://hdl.handle.net/10017/38642
LA eng
DS MINDS@UW
RD 23-abr-2024