Video Captioning on MSR-VTT

1 papers with code • 0 benchmarks • 0 datasets

This task has no description! Would you like to contribute one?

Benchmarks

Add a Result

These leaderboards are used to track progress in Video Captioning on MSR-VTT

No evaluation results yet. Help compare methods by submitting evaluation metrics.

Most implemented papers

Most implemented Social Latest No code

COSA: Concatenated Sample Pretrained Vision-Language Foundation Model

txh-mercury/cosa • • 15 Jun 2023

Due to the limited scale and quality of video-text training corpus, most vision-language foundation models employ image-text datasets for pretraining and primarily focus on modeling visually semantic representations while disregarding temporal semantic representations and correlations.

Paper
Code

Video Captioning on MSR-VTT

Benchmarks Add a Result

Most implemented papers

COSA: Concatenated Sample Pretrained Vision-Language Foundation Model

Content

Benchmarks

Add a Result