1 code implementation • CVPR 2023 • Jianjie Luo, Yehao Li, Yingwei Pan, Ting Yao, Jianlin Feng, Hongyang Chao, Tao Mei
The rich semantics are further regarded as semantic prior to trigger the learning of Diffusion Transformer, which produces the output sentence in a diffusion process.
no code implementations • 14 Dec 2021 • Jianjie Luo, Yehao Li, Yingwei Pan, Ting Yao, Hongyang Chao, Tao Mei
BERT-type structure has led to the revolution of vision-language pre-training and the achievement of state-of-the-art results on numerous vision-language downstream tasks.
no code implementations • 5 Jul 2020 • Yingwei Pan, Yehao Li, Jianjie Luo, Jun Xu, Ting Yao, Tao Mei
In this work, we present Auto-captions on GIF, which is a new large-scale pre-training dataset for generic video understanding.