no code implementations • 10 Apr 2023 • Shentong Mo, Jingfei Xia, Ihor Markevych
Visual and linguistic pre-training aims to learn vision and language representations together, which can be transferred to visual-linguistic downstream tasks.
Image Retrieval Phrase Grounding +6