Temporal constraints on human and artificial muliti-sensory speech recognition

Thumbnail Image
Perlette, Christopher S.
University of Lethbridge. Faculty of Arts and Science
Journal Title
Journal ISSN
Volume Title
Lethbridge, Alta. : University of Lethbridge, Dept. of Neuroscience
Audio Visual Speech Recognition (AVSR) is the process of perceiving and understanding speech using audio and visual information. Combining visual information with auditory stimuli has been shown to improve AVSR performance when compared to purely auditory speech recognition when the task is performed in adverse conditions with large amounts of distracting noise. This work examines the relationship of auditory and visual speech information and the effect audio-visual temporary desynchronization has on AVSR performance. Using a whole report task, we show that (1) consistent with prior similar work, performance declines asymmetrically depending on the direction and quantity of a temporal lag, and (2) a common, modern architecture for computational AVSR does not show this asymmetry indicating a fundamental difference in biological and computational AVSR methods.
machine learning , speech recognition , behavioural , neuroscience , Speech processing systems--Research , Speech perception--Research , Visual perception--Research , Lipreading , Lipreading--Computer simulation , Machine learning , Neurosciences , Dissertations, Academic