1 code implementation • 27 Mar 2024 • Run Shao, Zhaoyang Zhang, Chao Tao, Yunsheng Zhang, Chengli Peng, Haifeng Li
Compared to Patch Embed, which requires more than one hundred tokens for one image, HOOK requires only 6 and 8 tokens for sparse and dense tasks, respectively, resulting in efficiency improvements of 1. 5 to 2. 8 times.
no code implementations • 31 Dec 2023 • Run Shao, Cheng Yang, Qiujun Li, Qing Zhu, Yongjun Zhang, Yansheng Li, Yu Liu, Yong Tang, Dapeng Liu, Shizhong Yang, Haifeng Li
We introduce the Language as Reference Framework (LaRF), a fundamental principle for constructing a multimodal unified model, aiming to strike a trade-off between the cohesion and autonomy among different modalities.