1 code implementation • 8 May 2024 • Yunxin Li, Baotian Hu, Haoyuan Shi, Wei Wang, Longyue Wang, Min Zhang
Large Multimodal Models (LMMs) have achieved impressive success in visual understanding and reasoning, remarkably improving the performance of mathematical reasoning in a visual context.
1 code implementation • 21 Feb 2024 • Yunxin Li, Baotian Hu, Wenhan Luo, Lin Ma, Yuxin Ding, Min Zhang
For this setting, previous methods utilize visual and textual encoders to encode the image and keywords and employ a language model-based decoder to generate the product description.
no code implementations • 21 Feb 2024 • Yunxin Li, Xinyu Chen, Baotain Hu, Min Zhang
Long video understanding is a significant and ongoing challenge in the intersection of multimedia and artificial intelligence.
no code implementations • 21 Feb 2024 • Yunxin Li, Xinyu Chen, Baotian Hu, Haoyuan Shi, Min Zhang
Evaluating and Rethinking the current landscape of Large Multimodal Models (LMMs), we observe that widely-used visual-language projection approaches (e. g., Q-former or MLP) focus on the alignment of image-text descriptions yet ignore the visual knowledge-dimension alignment, i. e., connecting visuals to their relevant knowledge.
no code implementations • 27 Dec 2023 • Yunxin Li, Fan Liu, Zhen Du, Weijie Yuan, Qingjiang Shi, Christos Masouros
In this study, we propose novel frame structures that incorporate ISAC signals for three crucial stages in the NR-V2X system: initial access, connected mode, and beam failure and recovery.
no code implementations • 27 Nov 2023 • Yunxin Li, Baotian Hu, Wei Wang, Xiaochun Cao, Min Zhang
These models predominantly map visual information into language representation space, leveraging the vast knowledge and powerful text generation abilities of LLMs to produce multimodal instruction-following responses.
no code implementations • 13 Nov 2023 • Yunxin Li, Longyue Wang, Baotian Hu, Xinyu Chen, Wanqi Zhong, Chenyang Lyu, Wei Wang, Min Zhang
The emergence of multimodal large models (MLMs) has significantly advanced the field of visual understanding, offering remarkable capabilities in the realm of visual question answering (VQA).
no code implementations • 16 Aug 2023 • Fuwang Dong, Fan Liu, Yuanhao Cui, Shihang Lu, Yunxin Li
Sensing-as-a-service is anticipated to be the core feature of 6G perceptive mobile networks (PMN), where high-precision real-time sensing will become an inherent capability rather than being an auxiliary function as before.
no code implementations • 15 Jun 2023 • Zilin Du, Yunxin Li, Xu Guo, Yidan Sun, Boyang Li
Contemporary news reporting increasingly features multimedia content, motivating research on multimedia event extraction.
1 code implementation • 8 May 2023 • Yunxin Li, Baotian Hu, Xinyu Chen, Yuxin Ding, Lin Ma, Min Zhang
This makes the language model well-suitable for such multi-modal reasoning scenario on joint textual and visual clues.
1 code implementation • 5 May 2023 • Yunxin Li, Baotian Hu, Xinyu Chen, Lin Ma, Yong Xu, Min Zhang
LMEye addresses this issue by allowing the LLM to request the desired visual information aligned with various human instructions, which we term as the dynamic visual information interaction.
1 code implementation • 3 May 2023 • Yunxin Li, Baotian Hu, Yuxin Ding, Lin Ma, Min Zhang
Inspired by the Divide-and-Conquer algorithm and dual-process theory, in this paper, we regard linguistically complex texts as compound proposition texts composed of multiple simple proposition sentences and propose an end-to-end Neural Divide-and-Conquer Reasoning framework, dubbed NDCR.
no code implementations • 1 May 2023 • Zhen Du, Fan Liu, Yunxin Li, Weijie Yuan, Yuanhao Cui, Zenghui Zhang, Christos Masouros, Bo Ai
Connected and autonomous vehicle (CAV) networks face several challenges, such as low throughput, high latency, and poor localization accuracy.
no code implementations • 30 Jan 2023 • Yunxin Li, Fan Liu, Zhen Du, Weijie Yuan, Christos Masouros
The emergence of the fifth-generation (5G) New Radio (NR) brings additional possibilities to vehicle-to-everything (V2X) network with improved quality of services.
1 code implementation • 23 Jul 2022 • Qian Yang, Yunxin Li, Baotian Hu, Lin Ma, Yuxing Ding, Min Zhang
CSI), a relation inferrer, and a Lexical Constraint-aware Generator (arr.
no code implementations • 17 Jun 2022 • Yu Zhao, Xinshuo Hu, Yunxin Li, Baotian Hu, Dongfang Li, Sichao Chen, Xiaolong Wang
In this paper, we propose a general Multi-Skill Dialog Framework, namely MSDF, which can be applied in different dialog tasks (e. g. knowledge grounded dialog and persona based dialog).
no code implementations • 17 Jun 2022 • Yu Zhao, Yunxin Li, Yuxiang Wu, Baotian Hu, Qingcai Chen, Xiaolong Wang, Yuxin Ding, Min Zhang
To mitigate this problem, we propose a medical response generation model with Pivotal Information Recalling (MedPIR), which is built on two components, i. e., knowledge-aware dialogue graph encoder and recall-enhanced generator.
no code implementations • 4 Jul 2021 • Yunxin Li, Qian Yang, Qingcai Chen, Lin Ma, Baotian Hu, Xiaolong Wang, Yuxin Ding
Single online handwritten Chinese character recognition~(single OLHCCR) has achieved prominent performance.
no code implementations • 1 Jul 2021 • Yunxin Li, Yu Zhao, Baotian Hu, Qingcai Chen, Yang Xiang, Xiaolong Wang, Yuxin Ding, Lin Ma
Previous works indicate that the glyph of Chinese characters contains rich semantic information and has the potential to enhance the representation of Chinese characters.