no code implementations • 31 Mar 2022 • Piyush Singh Pasi, Shubham Nemani, Preethi Jyothi, Ganesh Ramakrishnan
We focus on the audio-visual video parsing (AVVP) problem that involves detecting audio and visual event labels with temporal boundaries.