1 code implementation • 26 Mar 2024 • Jiamian Wang, Guohao Sun, Pichao Wang, Dongfang Liu, Sohail Dianat, Majid Rabbani, Raghuveer Rao, Zhiqiang Tao
Correspondingly, a single text embedding may be less expressive to capture the video embedding and empower the retrieval.
1 code implementation • 17 Mar 2024 • Guohao Sun, Can Qin, Jiamian Wang, Zeyuan Chen, ran Xu, Zhiqiang Tao
Recent advancements in the vision-language model have shown notable generalization in vision-language tasks after visual instruction tuning.
Ranked #39 on Visual Question Answering on MM-Vet