1 code implementation • 30 Apr 2024 • Hang Du, Sicheng Zhang, Binzhu Xie, Guoshun Nan, Jiayang Zhang, Junrui Xu, Hangyu Liu, Sicong Leng, Jiangming Liu, Hehe Fan, Dajiu Huang, Jing Feng, Linli Chen, Can Zhang, Xuhuan Li, Hao Zhang, Jianhang Chen, Qimei Cui, Xiaofeng Tao
In pursuit of these answers, we present a comprehensive benchmark for Causation Understanding of Video Anomaly (CUVA).
no code implementations • 30 Mar 2024 • Ruijie Quan, Wenguan Wang, Fan Ma, Hehe Fan, Yi Yang
We select the highest-scoring clusters and use their medoid nodes for the next iteration of clustering, until we obtain a hierarchical and informative representation of the protein.
no code implementations • 24 Mar 2024 • Xiangpeng Yang, Linchao Zhu, Hehe Fan, Yi Yang
We find that the crux of the issue stems from the imprecise distribution of attention weights across designated regions, including inaccurate text-to-attribute control and attention leakage.
no code implementations • 15 Feb 2024 • Chao Wang, Hehe Fan, Ruijie Quan, Yi Yang
The protein first undergoes protein encoders and PLP-former to produce protein embeddings, which are then projected by the adapter to conform with the LLM.
no code implementations • 9 Feb 2024 • Zhenglin Zhou, Fan Ma, Hehe Fan, Yi Yang
Specifically, we incorporate the FLAME into both 3D representation and score distillation: 1) FLAME-based 3D Gaussian splatting, driving 3D Gaussian points by rigging each point to a FLAME mesh.
1 code implementation • 29 Jan 2024 • Yuze Hao, Jianrong Zhang, Tao Zhuo, Fuan Wen, Hehe Fan
To address this problem, we propose a data-driven method for coarse motion refinement.
1 code implementation • 26 Dec 2023 • Hang Du, Guoshun Nan, Sicheng Zhang, Binzhu Xie, Junrui Xu, Hehe Fan, Qimei Cui, Xiaofeng Tao, Xudong Jiang
Multimodal Sarcasm Understanding (MSU) has a wide range of applications in the news field such as public opinion analysis and forgery detection.
no code implementations • 6 Dec 2023 • Xiaobo Hu, Youfang Lin, Hehe Fan, Shuo Wang, Zhihao Wu, Kai Lv
To this end, an agent needs to 1) learn a piece of certain knowledge about the relations of object categories in the world during training and 2) look for the target object based on the pre-learned object category relations and its moving trajectory in the current unseen environment.
no code implementations • 4 Dec 2023 • Xiaobo Hu, Youfang Lin, Yue Liu, Jinwen Wang, Shuo Wang, Hehe Fan, Kai Lv
Visual reinforcement learning has proven effective in solving control tasks with high-dimensional observations.
no code implementations • 27 Nov 2023 • Yu Lu, Linchao Zhu, Hehe Fan, Yi Yang
Text-to-video (T2V) generation is a rapidly growing research area that aims to translate the scenes, objects, and actions within complex video text into a sequence of coherent visual frames.
1 code implementation • 16 Oct 2023 • Tao Zhuo, Zhiyong Cheng, Hehe Fan, Mohan Kankanhalli
Existing CL methods usually reduce forgetting with task priors, \ie using task identity or a subset of previously seen samples for model training.
1 code implementation • ICCV 2023 • Zhiqiang Shen, Xiaoxiao Sheng, Hehe Fan, Longguang Wang, Yulan Guo, Qiong Liu, Hao Wen, Xi Zhou
In this paper, we propose a Masked Spatio-Temporal Structure Prediction (MaST-Pre) method to capture the structure of point cloud videos without human annotations.
no code implementations • ICCV 2023 • Xiaoxiao Sheng, Zhiqiang Shen, Gang Xiao, Longguang Wang, Yulan Guo, Hehe Fan
Instead of contrasting the representations of clips or frames, in this paper, we propose a unified self-supervised framework by conducting contrastive learning at the point level.
no code implementations • 31 Jul 2023 • Yue Zhang, Hehe Fan, Yi Yang, Mohan Kankanhalli
The proposed method, named Mixture of Depth and Point cloud video experts (DPMix), achieved the first place in the 4D Action Segmentation Track of the HOI4D Challenge 2023.
no code implementations • 25 Jul 2023 • Yi Cheng, Hehe Fan, Dongyun Lin, Ying Sun, Mohan Kankanhalli, Joo-Hwee Lim
The main challenge in video question answering (VideoQA) is to capture and understand the complex spatial and temporal relations between objects based on given questions.
no code implementations • 13 Jul 2023 • Yi Cheng, Ziwei Xu, Fen Fang, Dongyun Lin, Hehe Fan, Yongkang Wong, Ying Sun, Mohan Kankanhalli
Our research focuses on the innovative application of a differentiable logic loss in the training to leverage the co-occurrence relations between verb and noun, as well as the pre-trained Large Language Models (LLMs) to generate the logic rules for the adaptation to unseen action labels.
1 code implementation • 23 May 2023 • Tao Zhuo, Zhiyong Cheng, Zan Gao, Hehe Fan, Mohan Kankanhalli
Experience Replay (ER) is a simple and effective rehearsal-based strategy, which optimizes the model with current training data and a subset of old samples stored in a memory buffer.
no code implementations • 13 Jan 2023 • Guangzhi Wang, Hehe Fan, Mohan Kankanhalli
To overcome these two challenges, we propose a unified Relation-Enhanced Transformer (RET) to improve representation discriminability for both point cloud and natural language queries.
no code implementations • ICCV 2023 • Ming Li, Xiangyu Xu, Hehe Fan, Pan Zhou, Jun Liu, Jia-Wei Liu, Jiahe Li, Jussi Keppo, Mike Zheng Shou, Shuicheng Yan
For the first time, we introduce vision Transformers into PPAR by treating a video as a tubelet sequence, and accordingly design two complementary mechanisms, i. e., sparsification and anonymization, to remove privacy from a spatio-temporal perspective.
no code implementations • CVPR 2023 • Hehe Fan, Linchao Zhu, Yi Yang, Mohan Kankanhalli
Deep neural networks on regular 1D lists (e. g., natural languages) and irregular 3D sets (e. g., point clouds) have made tremendous achievements.
2 code implementations • 15 Sep 2022 • Yi Wang, Zhiwen Fan, Tianlong Chen, Hehe Fan, Zhangyang Wang
Vision Transformers (ViTs) have proven to be effective, in solving 2D image understanding tasks by training over large-scale image datasets; and meanwhile as a somehow separate track, in modeling the 3D visual world too such as voxels or point clouds.
no code implementations • 5 Sep 2022 • Xiaoyu Feng, Heming Du, Yueqi Duan, Yongpan Liu, Hehe Fan
Effectively preserving and encoding structure features from objects in irregular and sparse LiDAR points is a key challenge to 3D object detection on point cloud.
1 code implementation • ICLR 2021 • Hehe Fan, Xin Yu, Yuhang Ding, Yi Yang, Mohan Kankanhalli
Then, a spatial convolution is employed to capture the local structure of points in the 3D space, and a temporal convolution is used to model the dynamics of the spatial regions along the time dimension.
Ranked #3 on 3D Action Recognition on NTU RGB+D
no code implementations • CVPR 2022 • Hehe Fan, Xiaojun Chang, Wanyue Zhang, Yi Cheng, Ying Sun, Mohan Kankanhalli
In this paper, we propose an unsupervised domain adaptation method for deep point cloud representation learning.
1 code implementation • CVPR 2021 • Hehe Fan, Yi Yang, Mohan Kankanhalli
To capture the dynamics in point cloud videos, point tracking is usually employed.
Ranked #4 on 3D Action Recognition on NTU RGB+D
2 code implementations • 18 Oct 2019 • Hehe Fan, Yi Yang
We apply PointRNN, PointGRU and PointLSTM to moving point cloud prediction, which aims to predict the future trajectories of points in a set given their history movements.
1 code implementation • ICCV 2019 • Qianyu Feng, Guoliang Kang, Hehe Fan, Yi Yang
In this paper, we exploit the semantic structure of open set data from two aspects: 1) Semantic Categorical Alignment, which aims to achieve good separability of target known classes by categorically aligning the centroid of target with the source.
1 code implementation • 6 Aug 2019 • Qianyu Feng, Yu Wu, Hehe Fan, Chenggang Yan, Yi Yang
By this novel cascaded captioning-revising mechanism, CRN can accurately describe images with unseen objects.
1 code implementation • 9 Jul 2019 • Yuhang Ding, Hehe Fan, Mingliang Xu, Yi Yang
However, a problem of the adaptive selection is that, when an image has too many neighborhoods, it is more likely to attract other images as its neighborhoods.
no code implementations • 20 Apr 2019 • Hehe Fan, Linchao Zhu, Yi Yang
Predicting future frames in videos has become a promising direction of research for both computer vision and robot learning communities.
no code implementations • ICCV 2017 • Hehe Fan, Xiaojun Chang, De Cheng, Yi Yang, Dong Xu, Alexander G. Hauptmann
relevant) to the given event class, we formulate this task as a multi-instance learning (MIL) problem by taking each video as a bag and the video shots in each video as instances.
1 code implementation • 30 May 2017 • Hehe Fan, Liang Zheng, Yi Yang
Progressively, pedestrian clustering and the CNN model are improved simultaneously until algorithm convergence.
Ranked #12 on Unsupervised Person Re-Identification on DukeMTMC-reID