Biologically-inspired auditory artificial intelligence for speech recognition in multi-talker environments

Thumbnail Image
Grasse, Lukas Walter Neufeld
Journal Title
Journal ISSN
Volume Title
Lethbridge, Alta. : University of Lethbridge, Dept. of Neuroscience
Understanding speech in the presence of distracting talkers is a difficult computational problem known as the cocktail party problem. Motivated by auditory processing in the human brain, this thesis developed a neural network to isolate the speech of a single talker given binaural input containing a target talker and multiple distractors. In this research the network is called a Binaural Speaker Isolation FFTNet or BSINet for short. To compare the performance of BSINet to human participant performance on recognizing the target talker's speech with a varying number of distractors, a "cocktail party" dataset was designed and made available online. This dataset also enables the comparison of network performance to human participant performance. Using the Word-Error-Rate metric for evaluation, this research finds that BSINet performs comparably to the human participants. Thus BSINet provides significant advancement for solving the challenging cocktail party problem.
Speech Recognition , Denoising , Speaker Isolation , Cocktail Party Problem , Auditory selective attention , Neural networks (Computer science) , Speech perception , Automatic speech recognition , Directional hearing , Auditory perception , Dissertations, Academic