1 code implementation • 22 Dec 2023 • Haozhan Shen, Tiancheng Zhao, Mingwei Zhu, Jianwei Yin
Visual grounding, a crucial vision-language task involving the understanding of the visual context based on the query expression, necessitates the model to capture the interactions between objects, as well as various spatial and attribute information.
1 code implementation • 1 Jul 2022 • Tiancheng Zhao, Tianqi Zhang, Mingwei Zhu, Haozhan Shen, Kyusong Lee, Xiaopeng Lu, Jianwei Yin
Inspired by the CheckList for testing natural language processing, we exploit VL-CheckList, a novel framework to understand the capabilities of VLP models.