Video Narrative Grounding
1 papers with code • 0 benchmarks • 1 datasets
Video Narrative Grounding is the task of linking video narratives to specific video segments. The input is a video with a text description (the narrative) and the positions of certain nouns marked. For each marked noun, the method must output a segmentation mask for the object it refers to, in each video frame.
Source: Connecting Vision and Language with Video Localized Narratives
Benchmarks
These leaderboards are used to track progress in Video Narrative Grounding
Most implemented papers
Connecting Vision and Language with Video Localized Narratives
We propose Video Localized Narratives, a new form of multimodal video annotations connecting vision and language.