1 code implementation • 18 Feb 2024 • Muyang He, Yexin Liu, Boya Wu, Jianhao Yuan, Yueze Wang, Tiejun Huang, Bo Zhao
Multimodal Large Language Models (MLLMs) have demonstrated notable capabilities in general visual understanding and reasoning tasks.
2 code implementations • 9 Jul 2023 • Bo Zhao, Boya Wu, Muyang He, Tiejun Huang
Thanks to the emerging of foundation models, the large language and vision models are integrated to acquire the multimodal ability of visual captioning, question answering, etc.
1 code implementation • 8 Jun 2023 • Muyang He, Shuo Yang, Tiejun Huang, Bo Zhao
The state of the art of many learning tasks, e. g., image classification, is advanced by collecting larger datasets and then training larger models on them.