1 code implementation • 2 May 2024 • Xiaoqi Zhao, Youwei Pang, Wei Ji, Baicheng Sheng, Jiaming Zuo, Lihe Zhang, Huchuan Lu
Different from the context-independent (CI) concepts such as human, car, and airplane, context-dependent (CD) concepts require higher visual understanding ability, such as camouflaged object and medical lesion.
no code implementations • 20 Feb 2024 • Qi Bi, Beichen Zhou, Jingjun Yi, Wei Ji, Haolan Zhan, Gui-Song Xia
In this paper, we propose the task of domain generalized oriented object detection, which intends to explore the generalization of oriented object detectors on arbitrary unseen target domains.
no code implementations • 16 Jan 2024 • Qi Bi, Wei Ji, Jingjun Yi, Haolan Zhan, Gui-Song Xia
To comprehensively learn the relation between informative patches and fine-grained semantics, the multi-instance knowledge distillation is implemented on both the region/image crop pairs from the teacher and student net, and the region-image crops inside the teacher / student net, which we term as intra-level multi-instance distillation and inter-level multi-instance distillation.
Fine-Grained Visual Categorization Knowledge Distillation +2
no code implementations • 21 Nov 2023 • Minghe Gao, Juncheng Li, Hao Fei, Liang Pang, Wei Ji, Guoming Wang, Wenqiao Zhang, Siliang Tang, Yueting Zhuang
Visual programming, a modular and generalizable paradigm, integrates different modules and Python operators to solve various vision-language tasks.
no code implementations • 21 Nov 2023 • Meng Chu, Zhedong Zheng, Wei Ji, Tingyu Wang, Tat-Seng Chua
Navigating drones through natural language commands remains challenging due to the dearth of accessible multi-modal datasets and the stringent precision requirements for aligning visual and textual data.
1 code implementation • 8 Nov 2023 • Ao Zhang, Yuan YAO, Wei Ji, Zhiyuan Liu, Tat-Seng Chua
The development of large language models (LLMs) has greatly advanced the field of multimodal understanding, leading to the emergence of large multimodal models (LMMs).
1 code implementation • 12 Oct 2023 • Xiangyan Liu, Rongxue Li, Wei Ji, Tao Lin
The reasoning capabilities of LLM (Large Language Model) are widely acknowledged in recent research, inspiring studies on tool learning and autonomous agents.
no code implementations • 9 Oct 2023 • Li Li, You Qin, Wei Ji, Yuxiao Zhou, Roger Zimmermann
Panoptic Scene Graph Generation (PSG) involves the detection of objects and the prediction of their corresponding relationships (predicates).
no code implementations • 29 Sep 2023 • Wei Ji, Li Li, Hao Fei, Xiangyan Liu, Xun Yang, Juncheng Li, Roger Zimmermann
Referring Image Understanding (RIS) has been extensively studied over the past decade, leading to the development of advanced algorithms.
1 code implementation • 11 Sep 2023 • Shengqiong Wu, Hao Fei, Leigang Qu, Wei Ji, Tat-Seng Chua
While recently Multimodal Large Language Models (MM-LLMs) have made exciting strides, they mostly fall prey to the limitation of only input-side multimodal understanding, without the ability to produce content in multiple modalities.
no code implementations • 26 Aug 2023 • Hao Fei, Shengqiong Wu, Wei Ji, Hanwang Zhang, Tat-Seng Chua
In this work, we investigate strengthening the awareness of video dynamics for DMs, for high-quality T2V generation.
no code implementations • ICCV 2023 • Jiacong Xu, Yi Zhang, Jiawei Peng, Wufei Ma, Artur Jesslen, Pengliang Ji, Qixin Hu, Jiehua Zhang, Qihao Liu, Jiahao Wang, Wei Ji, Chen Wang, Xiaoding Yuan, Prakhar Kaushik, Guofeng Zhang, Jie Liu, Yushan Xie, Yawen Cui, Alan Yuille, Adam Kortylewski
Animal3D consists of 3379 images collected from 40 mammal species, high-quality annotations of 26 keypoints, and importantly the pose and shape parameters of the SMAL model.
Ranked #1 on Animal Pose Estimation on Animal3D
no code implementations • 19 Aug 2023 • Kaihang Pan, Juncheng Li, Wenjie Wang, Hao Fei, Hongye Song, Wei Ji, Jun Lin, Xiaozhong Liu, Tat-Seng Chua, Siliang Tang
Recent studies indicate that dense retrieval models struggle to perform well on a wide variety of retrieval tasks that lack dedicated training data, as different retrieval tasks often entail distinct search intents.
1 code implementation • 8 Aug 2023 • Wei Ji, Xiangyan Liu, An Zhang, Yinwei Wei, Yongxin Ni, Xiang Wang
To be specific, we first introduce an ID-aware Multi-modal Transformer module in the item representation learning stage to facilitate information interaction among different features.
1 code implementation • 8 Aug 2023 • Juncheng Li, Kaihang Pan, Zhiqi Ge, Minghe Gao, Hanwang Zhang, Wei Ji, Wenqiao Zhang, Tat-Seng Chua, Siliang Tang, Yueting Zhuang
This shortcoming results in MLLMs' underperformance in comprehending demonstrative instructions consisting of multiple, interleaved, and multimodal instructions that demonstrate the required context to complete a task.
1 code implementation • 28 Jul 2023 • Li Li, Wei Ji, Yiming Wu, Mengze Li, You Qin, Lina Wei, Roger Zimmermann
To promise consistency and accuracy during the transfer process, we propose to measure the invariance of representations in each predicate class, and learn unbiased prototypes of predicates with different intensities.
Ranked #3 on Panoptic Scene Graph Generation on PSG Dataset
no code implementations • 18 Jul 2023 • Meng Wei, Long Chen, Wei Ji, Xiaoyu Yue, Roger Zimmermann
While recent video-based methods utilizing video tubelets have shown promising results, we argue that the effective modeling of spatial and temporal context plays a more significant role than the choice between clip tubelets and video tubelets.
no code implementations • 20 May 2023 • Shengqiong Wu, Hao Fei, Wei Ji, Tat-Seng Chua
Unpaired cross-lingual image captioning has long suffered from irrelevancy and disfluency issues, due to the inconsistencies of the semantic scene and syntax attributes during transfer.
1 code implementation • 19 May 2023 • Yu Zhao, Hao Fei, Wei Ji, Jianguo Wei, Meishan Zhang, Min Zhang, Tat-Seng Chua
With an external 3D scene extractor, we obtain the 3D objects and scene features for input images, based on which we construct a target object-centered 3D spatial scene graph (Go3D-S2G), such that we model the spatial semantics of target objects within the holistic 3D scenes.
1 code implementation • NeurIPS 2023 • Ao Zhang, Hao Fei, Yuan YAO, Wei Ji, Li Li, Zhiyuan Liu, Tat-Seng Chua
While developing a new multimodal LLM (MLLM) by pre-training on tremendous image-text pairs from scratch can be exceedingly resource-consuming, connecting an existing LLM with a comparatively lightweight visual prompt generator (VPG) becomes a feasible paradigm.
3 code implementations • 25 Apr 2023 • Junde Wu, Wei Ji, Yuanpei Liu, Huazhu Fu, Min Xu, Yanwu Xu, Yueming Jin
In Med-SA, we propose Space-Depth Transpose (SD-Trans) to adapt 2D SAM to 3D medical images and Hyper-Prompting Adapter (HyP-Adpt) to achieve prompt-conditioned adaptation.
1 code implementation • 12 Apr 2023 • Wei Ji, Jingjing Li, Qi Bi, TingWei Liu, Wenbo Li, Li Cheng
Recently, Meta AI Research approaches a general, promptable Segment Anything Model (SAM) pre-trained on an unprecedentedly large segmentation dataset (SA-1B).
1 code implementation • ICCV 2023 • Qifan Yu, Juncheng Li, Yu Wu, Siliang Tang, Wei Ji, Yueting Zhuang
Based on that, we further introduce a novel Entangled cross-modal prompt approach for open-world predicate scene graph generation (Epic), where models can generalize to unseen predicates in a zero-shot manner.
no code implementations • ICCV 2023 • Juncheng Li, Minghe Gao, Longhui Wei, Siliang Tang, Wenqiao Zhang, Mengze Li, Wei Ji, Qi Tian, Tat-Seng Chua, Yueting Zhuang
Prompt tuning, a recently emerging paradigm, enables the powerful vision-language pre-training models to adapt to downstream tasks in a parameter -- and data -- efficient way, by learning the ``soft prompts'' to condition frozen pre-training models.
no code implementations • 25 Feb 2023 • Zhongyi Guo, Keji Han, Yao Ge, Wei Ji, Yun Li
In this paper, AAP is defined as the recognition of three signatures, i. e., {\em attack algorithm}, {\em victim model} and {\em hyperparameter}.
2 code implementations • 19 Jan 2023 • Junde Wu, Wei Ji, Huazhu Fu, Min Xu, Yueming Jin, Yanwu Xu
To effectively integrate these two cutting-edge techniques for the Medical image segmentation, we propose a novel Transformer-based Diffusion framework, called MedSegDiff-V2.
no code implementations • CVPR 2023 • Mengze Li, Han Wang, Wenqiao Zhang, Jiaxu Miao, Zhou Zhao, Shengyu Zhang, Wei Ji, Fei Wu
WINNER first builds the language decomposition tree in a bottom-up manner, upon which the structural attention mechanism and top-down feature backtracking jointly build a multi-modal decomposition tree, permitting a hierarchical understanding of unstructured videos.
1 code implementation • CVPR 2023 • Wei Ji, Jingjing Li, Cheng Bian, Zongwei Zhou, Jiaying Zhao, Alan L. Yuille, Li Cheng
This gives rise to significantly more robust segmentation of image objects in complex scenes and under adverse conditions.
1 code implementation • CVPR 2023 • Wei Ji, Renjie Liang, Zhedong Zheng, Wenqiao Zhang, Shengyu Zhang, Juncheng Li, Mengze Li, Tat-Seng Chua
Moreover, we treat the uncertainty score of frames in a video as a whole, and estimate the difficulty of each video, which can further relieve the burden of video selection.
no code implementations • 26 Dec 2022 • Wei Ji, Long Chen, Yinwei Wei, Yiming Wu, Tat-Seng Chua
In this work, we propose a novel multi-resolution temporal video sentence grounding network: MRTNet, which consists of a multi-modal feature encoder, a Multi-Resolution Temporal (MRT) module, and a predictor module.
1 code implementation • 22 Dec 2022 • Yali Du, Yinwei Wei, Wei Ji, Fan Liu, Xin Luo, Liqiang Nie
The booming development and huge market of micro-videos bring new e-commerce channels for merchants.
no code implementations • 21 Dec 2022 • Tu Xu, Kan Wu, Yongdong Zhu, Wei Ji
This paper proposes a new driving style recognition approach that allows autonomous vehicles (AVs) to perform trajectory predictions for surrounding vehicles with minimal data.
1 code implementation • 14 Nov 2022 • Yiyang Chen, Zhedong Zheng, Wei Ji, Leigang Qu, Tat-Seng Chua
The key idea underpinning the proposed method is to integrate fine- and coarse-grained retrieval as matching data points with small and large fluctuations, respectively.
Composed Image Retrieval (CoIR) Image Retrieval with Multi-Modal Query +1
no code implementations • 21 Jul 2022 • Yang Chen, Shanshan Zhao, Wei Ji, Mingming Gong, Liping Xie
However, facing a new environment where the test data occurs online and differs from the training data in the RGB image content and depth sparsity, the trained model might suffer severe performance drop.
1 code implementation • ACM SIGIR Conference on Research and Development in Information Retrieval 2022 • Chenchen Ye, Lizi Liao, Fuli Feng, Wei Ji, Tat-Seng Chua
Existing approaches either 1) predict structured dialog acts first and then generate natural response; or 2) map conversation context to natural responses directly in an end-to-end manner.
1 code implementation • CVPR 2022 • Yicong Li, Xiang Wang, Junbin Xiao, Wei Ji, Tat-Seng Chua
At its core is understanding the alignments between visual scenes in video and linguistic semantics in question to yield the answer.
1 code implementation • 23 May 2022 • Yuan YAO, Qianyu Chen, Ao Zhang, Wei Ji, Zhiyuan Liu, Tat-Seng Chua, Maosong Sun
We show that PEVL enables state-of-the-art performance of detector-free VLP models on position-sensitive tasks such as referring expression comprehension and phrase grounding, and also improves the performance on position-insensitive tasks with grounded inputs.
Ranked #1 on Visual Commonsense Reasoning on VCR (Q-AR) test
1 code implementation • ICLR 2022 • Wei Ji, Jingjing Li, Qi Bi, Chuan Guo, Jie Liu, Li Cheng
The laborious and time-consuming manual annotation has become a real bottleneck in various practical scenarios.
1 code implementation • 27 Apr 2022 • Zhedong Zheng, Jiayin Zhu, Wei Ji, Yi Yang, Tat-Seng Chua
This research aims to study a self-supervised 3D clothing reconstruction method, which recovers the geometry shape and texture of human clothing from a single image.
Ranked #1 on Single-View 3D Reconstruction on CUB-200-2011
2 code implementations • 22 Mar 2022 • Ao Zhang, Yuan YAO, Qianyu Chen, Wei Ji, Zhiyuan Liu, Maosong Sun, Tat-Seng Chua
Scene graph generation (SGG) is designed to extract (subject, predicate, object) triplets in images.
Ranked #1 on Predicate Classification on Visual Genome
1 code implementation • 2 Mar 2022 • Yaoyao Zhong, Junbin Xiao, Wei Ji, Yicong Li, Weihong Deng, Tat-Seng Chua
Video Question Answering (VideoQA) aims to answer natural language questions according to the given videos.
1 code implementation • 26 Feb 2022 • Guanghao Yin, Wei Wang, Zehuan Yuan, Chuchu Han, Wei Ji, Shouqian Sun, Changhu Wang
The comparisons of distribution differences between HQ and LQ images can help our model better assess the image quality.
no code implementations • CVPR 2022 • Jingjing Li, Tianyu Yang, Wei Ji, Jue Wang, Li Cheng
Inspired by recent success in unsupervised contrastive representation learning, we propose a novel denoised cross-video contrastive algorithm, aiming to enhance the feature discrimination ability of video snippets for accurate temporal action localization in the weakly-supervised setting.
1 code implementation • CVPR 2022 • Chuan Guo, Shihao Zou, Xinxin Zuo, Sen Wang, Wei Ji, Xingyu Li, Li Cheng
Automated generation of 3D human motions from text is a challenging problem.
Ranked #6 on Motion Synthesis on InterHuman
1 code implementation • 12 Dec 2021 • Junbin Xiao, Angela Yao, Zhiyuan Liu, Yicong Li, Wei Ji, Tat-Seng Chua
To align with the multi-granular essence of linguistic concepts in language queries, we propose to model video as a conditional graph hierarchy which weaves together visual facts of different granularity in a level-wise manner, with the guidance of corresponding textual cues.
Ranked #25 on Video Question Answering on NExT-QA
1 code implementation • 10 Dec 2021 • Meng Wei, Long Chen, Wei Ji, Xiaoyu Yue, Tat-Seng Chua
Since each verb is associated with a specific set of semantic roles, all existing GSR methods resort to a two-stage framework: predicting the verb in the first stage and detecting the semantic roles in the second stage.
Ranked #3 on Situation Recognition on imSitu
1 code implementation • NeurIPS 2021 • Jingjing Li, Wei Ji, Qi Bi, Cheng Yan, Miao Zhang, Yongri Piao, Huchuan Lu, Li Cheng
As a by-product, a CapS dataset is constructed by augmenting existing benchmark training set with additional image tags and captions.
1 code implementation • 16 Nov 2021 • Andras Huebner, Wei Ji, Xiang Xiao
Lastly, we compare the performance of our baseline models with BART, a state-of-the-art language model that is effective for summarization.
4 code implementations • 1 Nov 2021 • Guanghua Yu, Qinyao Chang, Wenyu Lv, Chang Xu, Cheng Cui, Wei Ji, Qingqing Dang, Kaipeng Deng, Guanzhong Wang, Yuning Du, Baohua Lai, Qiwen Liu, Xiaoguang Hu, dianhai yu, Yanjun Ma
We investigate the applicability of the anchor-free strategy on lightweight object detection models.
Ranked #1 on Object Detection on MSCOCO
no code implementations • 29 Sep 2021 • Chenchen Ye, Lizi Liao, Fuli Feng, Wei Ji, Tat-Seng Chua
The core is to construct a latent content space for strategy optimization and disentangle the surface style from it.
no code implementations • 24 Jun 2021 • Tianjie Yang, Yaoru Luo, Wei Ji, Ge Yang
We conclude with an outlook on how deep learning could shape the future of this new generation of light microscopy technology.
1 code implementation • CVPR 2021 • Wei Ji, Jingjing Li, Shuang Yu, Miao Zhang, Yongri Piao, Shunyu Yao, Qi Bi, Kai Ma, Yefeng Zheng, Huchuan Lu, Li Cheng
Complex backgrounds and similar appearances between objects and their surroundings are generally recognized as challenging scenarios in Salient Object Detection (SOD).
Ranked #13 on Thermal Image Segmentation on RGB-T-Glass-Segmentation
1 code implementation • CVPR 2021 • Wei Ji, Shuang Yu, Junde Wu, Kai Ma, Cheng Bian, Qi Bi, Jingjing Li, Hanruo Liu, Li Cheng, Yefeng Zheng
To our knowledge, our work is the first in producing calibrated predictions under different expertise levels for medical image segmentation.
1 code implementation • 3 Jun 2021 • Xun Yang, Fuli Feng, Wei Ji, Meng Wang, Tat-Seng Chua
To fill the research gap, we propose a causality-inspired VMR framework that builds structural causal model to capture the true effect of query and video content on the prediction.
no code implementations • 26 May 2021 • Feifei Shao, Long Chen, Jian Shao, Wei Ji, Shaoning Xiao, Lu Ye, Yueting Zhuang, Jun Xiao
With the success of deep neural networks in object detection, both WSOD and WSOL have received unprecedented attention.
1 code implementation • 8 Apr 2021 • Guanghao Yin, Wei Wang, Zehuan Yuan, Wei Ji, Dongdong Yu, Shouqian Sun, Tat-Seng Chua, Changhu Wang
We extract degradation prior at task-level with the proposed ConditionNet, which will be used to adapt the parameters of the basic SR network (BaseNet).
no code implementations • 15 Mar 2021 • Shaoning Xiao, Long Chen, Songyang Zhang, Wei Ji, Jian Shao, Lu Ye, Jun Xiao
State-of-the-art NLVL methods are almost in one-stage fashion, which can be typically grouped into two categories: 1) anchor-based approach: it first pre-defines a series of video segment candidates (e. g., by sliding window), and then does classification for each candidate; 2) anchor-free approach: it directly predicts the probabilities for each video frame as a boundary or intermediate frame inside the positive segment.
1 code implementation • ICCV 2021 • Miao Zhang, Jie Liu, Yifei Wang, Yongri Piao, Shunyu Yao, Wei Ji, Jingjing Li, Huchuan Lu, Zhongxuan Luo
Our bidirectional dynamic fusion strategy encourages the interaction of spatial and temporal information in a dynamic manner.
Ranked #12 on Video Polyp Segmentation on SUN-SEG-Easy (Unseen)
no code implementations • 1 Jan 2021 • Zhuoyu Wei, Wei Ji, Xiubo Geng, Yining Chen, Baihua Chen, Tao Qin, Daxin Jiang
We notice that some real-world QA tasks are more complex, which cannot be solved by end-to-end neural networks or translated to any kind of formal representations.
1 code implementation • 24 Aug 2020 • Mengyu Zhou, Qingtao Li, Xinyi He, Yuejiang Li, Yibo Liu, Wei Ji, Shi Han, Yining Chen, Daxin Jiang, Dongmei Zhang
It is common for people to create different types of charts to explore a multi-dimensional dataset (table).
2 code implementations • ECCV 2020 • Wei Ji, Jingjing Li, Miao Zhang, Yongri Piao, Huchuan Lu
The explicitly extracted edge information goes together with saliency to give more emphasis to the salient regions and object boundaries.
Ranked #19 on RGB-D Salient Object Detection on NJU2K
no code implementations • 30 Apr 2020 • Jing Han, Kun Qian, Meishu Song, Zijiang Yang, Zhao Ren, Shuo Liu, Juan Liu, Huaiyuan Zheng, Wei Ji, Tomoya Koike, Xiao Li, Zixing Zhang, Yoshiharu Yamamoto, Björn W. Schuller
In particular, by analysing speech recordings from these patients, we construct audio-only-based models to automatically categorise the health state of patients from four aspects, including the severity of illness, sleep quality, fatigue, and anxiety.
no code implementations • 6 Oct 2018 • Yiming Wu, Wei Ji, Xi Li, Gang Wang, Jianwei Yin, Fei Wu
As a fundamental and challenging problem in computer vision, hand pose estimation aims to estimate the hand joint locations from depth images.