1 code implementation • 16 Jul 2022 • Yuqi Liu, Pengfei Xiong, Luhui Xu, Shengming Cao, Qin Jin
In this paper, we propose Token Shift and Selection Network (TS2-Net), a novel token shift and selection transformer architecture, which dynamically adjusts the token sequence and selects informative tokens in both temporal and spatial dimensions from input video samples.
Ranked #8 on Video Retrieval on VATEX
1 code implementation • 21 Jun 2021 • Han Fang, Pengfei Xiong, Luhui Xu, Yu Chen
We present CLIP2Video network to transfer the image-language pre-training model to video-text retrieval in an end-to-end manner.
Ranked #11 on Video Retrieval on VATEX (using extra training data)