no code implementations • 9 Oct 2021 • Helen L. Bear, Veronica Morfi, Emmanouil Benetos
Sound scene geotagging is a new topic of research which has evolved from acoustic scene classification.
no code implementations • 9 Oct 2021 • David Heise, Helen L. Bear
We analyse multi-purpose audio using tools to visualise similarities within the data that may be observed via unsupervised methods.
1 code implementation • 13 May 2020 • Arjun Pankajakshan, Helen L. Bear, Vinod Subramanian, Emmanouil Benetos
In this paper we investigate the importance of the extent of memory in sequential self attention for sound recognition.
no code implementations • 24 Oct 2018 • Jake Burton, David Frank, Madhi Saleh, Nassir Navab, Helen L. Bear
Lipreading is a difficult gesture classification task.
no code implementations • 8 May 2018 • Kwanchiva Thangthai, Helen L. Bear, Richard Harvey
We compare the performance of a lipreading system by modeling visual speech using either 13 viseme or 38 phoneme units.
no code implementations • 8 May 2018 • Helen L. Bear, Richard Harvey
Not only is this ambiguity damaging to the performance of audio-visual classifiers operating on real expressive speech, there is also considerable choice between possible mappings.
no code implementations • 8 May 2018 • Helen L. Bear, Richard Harvey
Visual lip gestures observed whilst lipreading have a few working definitions, the most common two are; `the visual equivalent of a phoneme' and `phonemes which are indistinguishable on the lips'.
no code implementations • 3 Oct 2017 • Helen L. Bear, Sarah Taylor
This joining of two previously disparate areas with different perspectives on computer lipreading is creating opportunities for collaborations, but in doing so the literature is experiencing challenges in knowledge sharing due to multiple uses of terms and phrases and the range of methods for scoring results.
no code implementations • 3 Oct 2017 • Helen L. Bear
Benchmarked against SD results, and the isolated words performance, we test with RMAV dataset speakers and observe that with continuous speech, the trajectory between visemes has a greater negative effect on the speaker differentiation.
no code implementations • 3 Oct 2017 • Helen L. Bear
For machines to lipread, or understand speech from lip movement, they decode lip-motions (known as visemes) into the spoken sounds.
no code implementations • 3 Oct 2017 • Helen L. Bear, Richard W. Harvey, Barry-John Theobald, Yuxuan Lan
A critical assumption of all current visual speech recognition systems is that there are visual speech units called visemes which can be mapped to units of acoustic speech, the phonemes.
no code implementations • 3 Oct 2017 • Helen L. Bear, Richard Harvey, Barry-John Theobald, Yuxuan Lan
Visual-only speech recognition is dependent upon a number of factors that can be difficult to control, such as: lighting; identity; motion; emotion and expression.
no code implementations • 3 Oct 2017 • Helen L. Bear
The term "viseme" is used in machine lipreading to represent a visual cue or gesture which corresponds to a subgroup of phonemes where the phonemes are visually indistinguishable.
no code implementations • 3 Oct 2017 • Helen L. Bear, Stephen J. Cox, Richard W. Harvey
In machine lip-reading, which is identification of speech from visual-only information, there is evidence to show that visual speech is highly dependent upon the speaker [1].
no code implementations • 3 Oct 2017 • Helen L. Bear, Richard W. Harvey, Yuxuan Lan
In machine lip-reading there is continued debate and research around the correct classes to be used for recognition.
no code implementations • 3 Oct 2017 • Helen L. Bear, Richard Harvey
To undertake machine lip-reading, we try to recognise speech from a visual signal.
no code implementations • 3 Oct 2017 • Helen L. Bear, Gari Owen, Richard Harvey, Barry-John Theobald
In the quest for greater computer lip-reading performance there are a number of tacit assumptions which are either present in the datasets (high resolution for example) or in the methods (recognition of spoken visual units called visemes for example).