no code implementations • 10 May 2024 • Yujuan Ding, Wenqi Fan, Liangbo Ning, Shijie Wang, Hengyun Li, Dawei Yin, Tat-Seng Chua, Qing Li
Given the powerful abilities of RAG in providing the latest and helpful auxiliary information, retrieval-augmented large language models have emerged to harness external and authoritative knowledge bases, rather than solely relying on the model's internal knowledge, to augment the generation quality of LLMs.
no code implementations • 23 Apr 2024 • Wenqi Fan, Shijie Wang, Jiani Huang, Zhikai Chen, Yu Song, Wenzhuo Tang, Haitao Mao, Hui Liu, Xiaorui Liu, Dawei Yin, Qing Li
Meanwhile, graphs, especially knowledge graphs, are rich in reliable factual knowledge, which can be utilized to enhance the reasoning capabilities of LLMs and potentially alleviate their limitations such as hallucinations and the lack of explainability.
no code implementations • 12 Mar 2024 • Jiahao Zhang, Lin Wang, Shijie Wang, Wenqi Fan
Graph Neural Networks (GNNs) have achieved remarkable success in various real-world applications.
1 code implementation • 12 Mar 2024 • Junda Cheng, Wei Yin, Kaixuan Wang, Xiaozhi Chen, Shijie Wang, Xin Yang
In this work, we propose a new robustness benchmark to evaluate the depth estimation system under various noisy pose settings.
Ranked #1 on Monocular Depth Estimation on DDAD
no code implementations • 27 Feb 2024 • Yonghan Li, Chenyu Wu, Taoran Wu, Shijie Wang, Bai Xue
In this paper, we investigate the problem of verifying the finite-time safety of continuous-time perturbed deterministic systems represented by ordinary differential equations in the presence of measurable disturbances.
no code implementations • 5 Dec 2023 • Zhangyang Xiong, Chenghong Li, Kenkun Liu, Hongjie Liao, Jianqiao Hu, Junyi Zhu, Shuliang Ning, Lingteng Qiu, Chongjie Wang, Shijie Wang, Shuguang Cui, Xiaoguang Han
In this era, the success of large language models and text-to-image models can be attributed to the driving force of large-scale datasets.
1 code implementation • 22 Nov 2023 • Shijie Wang, Qi Zhao, Minh Quan Do, Nakul Agarwal, Kwonjoon Lee, Chen Sun
What makes good video representations for video understanding, such as anticipating future activities, or answering video-conditioned questions?
no code implementations • 13 Nov 2023 • Wenqi Fan, Shijie Wang, Xiao-Yong Wei, Xiaowei Mei, Qing Li
To perform untargeted attacks on social recommender systems, attackers can construct malicious social relationships for fake users to enhance the attack performance.
1 code implementation • 31 Oct 2023 • Ce Zhang, Changcheng Fu, Shijie Wang, Nakul Agarwal, Kwonjoon Lee, Chiho Choi, Chen Sun
To recognize and predict human-object interactions, we use a Transformer-based neural architecture which allows the "retrieval" of relevant objects for action anticipation at various time scales.
Ranked #3 on Long Term Action Anticipation on Ego4D
2 code implementations • 28 Sep 2023 • Jinze Bai, Shuai Bai, Yunfei Chu, Zeyu Cui, Kai Dang, Xiaodong Deng, Yang Fan, Wenbin Ge, Yu Han, Fei Huang, Binyuan Hui, Luo Ji, Mei Li, Junyang Lin, Runji Lin, Dayiheng Liu, Gao Liu, Chengqiang Lu, Keming Lu, Jianxin Ma, Rui Men, Xingzhang Ren, Xuancheng Ren, Chuanqi Tan, Sinan Tan, Jianhong Tu, Peng Wang, Shijie Wang, Wei Wang, Shengguang Wu, Benfeng Xu, Jin Xu, An Yang, Hao Yang, Jian Yang, Shusheng Yang, Yang Yao, Bowen Yu, Hongyi Yuan, Zheng Yuan, Jianwei Zhang, Xingxuan Zhang, Yichang Zhang, Zhenru Zhang, Chang Zhou, Jingren Zhou, Xiaohuan Zhou, Tianhang Zhu
Large language models (LLMs) have revolutionized the field of artificial intelligence, enabling natural language processing tasks that were previously thought to be exclusive to humans.
Ranked #3 on Multi-Label Text Classification on CC3M-TagMask
1 code implementation • 24 Aug 2023 • Jinze Bai, Shuai Bai, Shusheng Yang, Shijie Wang, Sinan Tan, Peng Wang, Junyang Lin, Chang Zhou, Jingren Zhou
In this work, we introduce the Qwen-VL series, a set of large-scale vision-language models (LVLMs) designed to perceive and understand both texts and images.
Ranked #3 on Visual Question Answering on MM-Vet
1 code implementation • 31 Jul 2023 • Qi Zhao, Shijie Wang, Ce Zhang, Changcheng Fu, Minh Quan Do, Nakul Agarwal, Kwonjoon Lee, Chen Sun
We propose to formulate the LTA task from two perspectives: a bottom-up approach that predicts the next actions autoregressively by modeling temporal dynamics; and a top-down approach that infers the goal of the actor and plans the needed procedure to accomplish the goal.
Ranked #1 on Long Term Action Anticipation on Ego4D
no code implementations • 5 Jun 2023 • Shijie Wang, Shangbo Wang
In this paper, we propose a Friend-Deep Q-network (Friend-DQN) approach for multiple traffic signal control in urban networks, which is based on an agent-cooperation scheme.
2 code implementations • 18 May 2023 • Peng Wang, Shijie Wang, Junyang Lin, Shuai Bai, Xiaohuan Zhou, Jingren Zhou, Xinggang Wang, Chang Zhou
In this work, we explore a scalable way for building a general representation model toward unlimited modalities.
Ranked #1 on Semantic Segmentation on ADE20K (using extra training data)
1 code implementation • 16 May 2023 • Junyu Wang, Shijie Wang, Ruijie Zhang, Zengqiang Zheng, Wenyu Liu, Xinggang Wang
We present RND-SCI, a novel framework for compressive hyperspectral image (HSI) reconstruction.
no code implementations • 10 May 2023 • Bruce X. B. Yu, Jianlong Chang, Haixin Wang, Lingbo Liu, Shijie Wang, Zhiyu Wang, Junfan Lin, Lingxi Xie, Haojie Li, Zhouchen Lin, Qi Tian, Chang Wen Chen
With the surprising development of pre-trained visual foundation models, visual tuning jumped out of the standard modus operandi that fine-tunes the whole pre-trained model or just the fully connected layer.
1 code implementation • 24 Jan 2023 • Junyu Wang, Shijie Wang, Wenyu Liu, Zengqiang Zheng, Xinggang Wang
We present a simple, efficient, and scalable unfolding network, SAUNet, to simplify the network design with an adaptive alternate optimization framework for hyperspectral image (HSI) reconstruction.
no code implementations • CVPR 2023 • Shijie Wang, Jianlong Chang, Haojie Li, Zhihui Wang, Wanli Ouyang, Qi Tian
PLEor could leverage pre-trained CLIP model to infer the discrepancies encompassing both pre-defined and unknown subcategories, called category-specific discrepancies, and transfer them to the backbone network trained in the close-set scenarios.
no code implementations • 29 Jul 2022 • Shijie Wang, Jianlong Chang, Zhihui Wang, Haojie Li, Wanli Ouyang, Qi Tian
In this paper, we develop Fine-grained Retrieval Prompt Tuning (FRPT), which steers a frozen pre-trained model to perform the fine-grained retrieval task from the perspectives of sample prompting and feature adaptation.
2 code implementations • ICCV 2023 • Yuxin Fang, Shusheng Yang, Shijie Wang, Yixiao Ge, Ying Shan, Xinggang Wang
We present an approach to efficiently and effectively adapt a masked image modeling (MIM) pre-trained vanilla Vision Transformer (ViT) for object detection, which is based on our two novel observations: (i) A MIM pre-trained vanilla ViT encoder can work surprisingly well in the challenging object-level recognition scenario even with randomly sampled partial observations, e. g., only 25% $\sim$ 50% of the input embeddings.
2 code implementations • CVPR 2021 • Ke Li, Shijie Wang, Xiang Zhang, Yifan Xu, Weijian Xu, Zhuowen Tu
Here we utilize the encoder-decoder structure in Transformers to perform regression-based person and keypoint detection that is general-purpose and requires less heuristic design compared with the existing approaches.
no code implementations • 12 Oct 2020 • Shijie Wang, Zhihui Wang, Haojie Li, Wanli Ouyang
Existing deep learning based weakly supervised fine-grained image recognition (WFGIR) methods usually pick out the discriminative regions from the high-level feature (HLF) maps directly.
no code implementations • ECCV 2020 • Lichang Chen, Guosheng Lin, Shijie Wang, Qingyao Wu
Scene Graph, as a vital tool to bridge the gap between language domain and image domain, has been widely adopted in the cross-modality task like VQA.
no code implementations • 3 Mar 2020 • Chongwei Liu, Zhihui Wang, Shijie Wang, Tao Tang, Yulong Tao, Caifei Yang, Haojie Li, Xing Liu, Xin Fan
We also propose a novel Poisson-blending Generative Adversarial Network (Poisson GAN) and an efficient object detection network (AquaNet) to address two common issues within related datasets: the class-imbalance problem and the problem of mass small object, respectively.
no code implementations • AAAI-2020 2020 • Zhihui Wang, Shijie Wang, Haojie Li, Zhi Dou, Jianjun Li
The key of Weakly Supervised Fine-grained Image Classification (WFGIC) is how to pick out the discriminative regions and learn the discriminative features from them.
Ranked #25 on Fine-Grained Image Classification on FGVC Aircraft