no code implementations • 7 Jun 2024 • Yuchao Wang, Peirui Cheng, Pengju Tian, Ziyang Yuan, Liangjin Zhao, Jing Tian, Wensheng Wang, Zhirui Wang, Xian Sun
The demand for collaborative perception across different perspectives to construct more comprehensive perceptual information is growing.
1 code implementation • 6 Mar 2021 • Zhenwang Qin, Wensheng Wang, Karl-Heinz Dammer, Leifeng Guo, Zhen Cao
Finally, we integrate them and propose a solution based on a light deep neural network (DNN), called Ag-YOLO, which can make the crop protection UAV have the ability to target detection and autonomous operation.
1 code implementation • CVPR 2019 • Chenyou Fan, Xiaofan Zhang, Shu Zhang, Wensheng Wang, Chi Zhang, Heng Huang
In this paper, we propose a novel end-to-end trainable Video Question Answering (VideoQA) framework with three major components: 1) a new heterogeneous memory which can effectively learn global context information from appearance and motion features; 2) a redesigned question memory which helps understand the complex semantics of question and highlights queried subjects; and 3) a new multimodal fusion layer which performs multi-step reasoning by attending to relevant visual and textual hints with self-updated attention.
Ranked #30 on Visual Question Answering (VQA) on MSRVTT-QA