2 code implementations • 6 Nov 2023 • Wenke Xia, Dong Wang, Xincheng Pang, Zhigang Wang, Bin Zhao, Di Hu, Xuelong Li
Generalizable articulated object manipulation is essential for home-assistant robots.
1 code implementation • 16 Apr 2023 • Wenke Xia, Xingjian Li, Andong Deng, Haoyi Xiong, Dejing Dou, Di Hu
However, such semantic consistency from the synchronization is hard to guarantee in unconstrained videos, due to the irrelevant modality noise and differentiated semantic correlation.
1 code implementation • 14 Feb 2023 • Wenke Xia, Xu Zhao, Xincheng Pang, Changqing Zhang, Di Hu
We surprisingly find that: the multimodal models with existing imbalance algorithms consistently perform worse than the unimodal one on specific subsets, in accordance with the modality bias.
1 code implementation • 7 Feb 2023 • Ruoxuan Feng, Wenke Xia, Di Hu
Specifically, we explore the effects of pre-trained models on two audio-visual learning scenarios: cross-modal initialization and multi-modal joint learning.
1 code implementation • 14 Jan 2023 • Hongpeng Lin, Ludan Ruan, Wenke Xia, Peiyu Liu, Jingyuan Wen, Yixin Xu, Di Hu, Ruihua Song, Wayne Xin Zhao, Qin Jin, Zhiwu Lu
Experimental results indicate that the models incorporating large language models (LLM) can generate more diverse responses, while the model utilizing knowledge graphs to introduce external knowledge performs the best overall.