no code implementations • 25 Oct 2023 • Asmar Nadeem, Adrian Hilton, Robert Dawes, Graham Thomas, Armin Mustafa
In the context of Audio Visual Question Answering (AVQA) tasks, the audio visual modalities could be learnt on three levels: 1) Spatial, 2) Temporal, and 3) Semantic.
Ranked #3 on Audio-visual Question Answering on MUSIC-AVQA
Audio-visual Question Answering Audio-Visual Question Answering (AVQA) +2
no code implementations • 26 Mar 2023 • Asmar Nadeem, Adrian Hilton, Robert Dawes, Graham Thomas, Armin Mustafa
Generating grammatically and semantically correct captions in video captioning is a challenging task.
Ranked #13 on Video Captioning on MSVD