Biologically-inspired auditory artificial intelligence for speech recognition in multi-talker environments

Thumbnail Image
Date
2020
Authors
Grasse, Lukas Walter Neufeld
Journal Title
Journal ISSN
Volume Title
Publisher
Lethbridge, Alta. : University of Lethbridge, Dept. of Neuroscience
Abstract
Understanding speech in the presence of distracting talkers is a difficult computational problem known as the cocktail party problem. Motivated by auditory processing in the human brain, this thesis developed a neural network to isolate the speech of a single talker given binaural input containing a target talker and multiple distractors. In this research the network is called a Binaural Speaker Isolation FFTNet or BSINet for short. To compare the performance of BSINet to human participant performance on recognizing the target talker's speech with a varying number of distractors, a "cocktail party" dataset was designed and made available online. This dataset also enables the comparison of network performance to human participant performance. Using the Word-Error-Rate metric for evaluation, this research finds that BSINet performs comparably to the human participants. Thus BSINet provides significant advancement for solving the challenging cocktail party problem.
Description
Keywords
Speech Recognition , Denoising , Speaker Isolation , Cocktail Party Problem , Auditory selective attention , Neural networks (Computer science) , Speech perception , Automatic speech recognition , Directional hearing , Auditory perception , Dissertations, Academic
Citation