no code implementations • 22 May 2024 • Zhaojun Guo, Jinghui Lu, Xuejing Liu, Rui Zhao, Zhenxing Qian, Fei Tan
Despite the notable advancements achieved by leveraging pre-trained vision-language (VL) models through few-shot tuning for downstream tasks, our detailed empirical study highlights a significant dependence of few-shot learning outcomes on the careful selection of training examples - a facet that has been previously overlooked in research.
1 code implementation • 29 May 2023 • Xuejing Liu, Wei Tang, Jinghui Lu, Rui Zhao, Zhaojun Guo, Fei Tan
Recent advancements in multimodal foundation models (e. g., CLIP) have excelled in zero-shot generalization.