no code implementations • 22 Dec 2022 • Yutaro Yamada, Yingtian Tang, Yoyo Zhang, Ilker Yildirim
Large-scale vision-language models such as CLIP have shown impressive performance on zero-shot image classification and image-to-text retrieval.
Attribute Image Classification +7