no code implementations • 16 May 2024 • Charles Raude, K R Prajwal, Liliane Momeni, Hannah Bull, Samuel Albanie, Andrew Zisserman, Gül Varol
To this end, we introduce a multi-task Transformer model, CSLR2, that is able to ingest a signing sequence and output in a joint embedding space between signed language and spoken language text.
1 code implementation • 16 Nov 2022 • K R Prajwal, Hannah Bull, Liliane Momeni, Samuel Albanie, Gül Varol, Andrew Zisserman
Through extensive evaluations, we verify our method for automatic annotation and our model architecture.
no code implementations • 1 Sep 2022 • Sindhu B Hegde, K R Prajwal, Rudrabha Mukhopadhyay, Vinay P Namboodiri, C. V. Jawahar
With the help of multiple powerful discriminators that guide the training process, our generator learns to synthesize speech sequences in any voice for the lip movements of any person.
no code implementations • 4 Aug 2022 • Liliane Momeni, Hannah Bull, K R Prajwal, Samuel Albanie, Gül Varol, Andrew Zisserman
Recently, sign language researchers have turned to sign language interpreted TV broadcasts, comprising (i) a video of continuous signing and (ii) subtitles corresponding to the audio content, as a readily available and large-scale source of training data.
1 code implementation • 29 Oct 2021 • K R Prajwal, Liliane Momeni, Triantafyllos Afouras, Andrew Zisserman
In this paper, we consider the task of spotting spoken keywords in silent video sequences -- also known as visual keyword spotting.
Ranked #1 on Visual Keyword Spotting on LRS2
no code implementations • CVPR 2022 • K R Prajwal, Triantafyllos Afouras, Andrew Zisserman
To this end, we make the following contributions: (1) we propose an attention-based pooling mechanism to aggregate visual speech representations; (2) we use sub-word units for lip reading for the first time and show that this allows us to better model the ambiguities of the task; (3) we propose a model for Visual Speech Detection (VSD), trained on top of the lip reading network.
Ranked #1 on Visual Speech Recognition on LRS2 (using extra training data)
Audio-Visual Active Speaker Detection Automatic Speech Recognition +5
1 code implementation • 20 Dec 2020 • Sindhu B Hegde, K R Prajwal, Rudrabha Mukhopadhyay, Vinay Namboodiri, C. V. Jawahar
In this work, we re-think the task of speech enhancement in unconstrained real-world environments.
Ranked #1 on Speech Denoising on LRS3+VGGSound
4 code implementations • 23 Aug 2020 • K R Prajwal, Rudrabha Mukhopadhyay, Vinay Namboodiri, C. V. Jawahar
However, they fail to accurately morph the lip movements of arbitrary identities in dynamic, unconstrained talking face videos, resulting in significant parts of the video being out-of-sync with the new audio.
Ranked #1 on Unconstrained Lip-synchronization on LRS3 (using extra training data)
1 code implementation • CVPR 2020 • K R Prajwal, Rudrabha Mukhopadhyay, Vinay Namboodiri, C. V. Jawahar
In this work, we explore the task of lip to speech synthesis, i. e., learning to generate natural speech given only the lip movements of a speaker.
Ranked #1 on Lip Reading on LRW