1 code implementation • 6 May 2024 • Haihong Hao, Shuo Wang, Huixia Ben, Yanbin Hao, Yansong Wang, Weiwei Wang
Specifically, we first process ME video frames and special frames or data parallelly by our cascaded Unimodal Space-Time Attention (USTA) to establish connections between subtle facial movements and specific facial areas.
Micro Expression Recognition Micro-Expression Recognition +1
1 code implementation • 28 Mar 2024 • Zhicai Wang, Longhui Wei, Tan Wang, Heyu Chen, Yanbin Hao, Xiang Wang, Xiangnan He, Qi Tian
Text-to-image (T2I) generative models have recently emerged as a powerful tool, enabling the creation of photo-realistic images and giving rise to a multitude of applications.
no code implementations • 23 Mar 2024 • Xingyu Zhu, Shuo Wang, Jinda Lu, Yanbin Hao, Haifeng Liu, Xiangnan He
Few-shot learning (FSL) based on manifold regularization aims to improve the recognition capacity of novel objects with limited training samples by mixing two samples from different categories with a blending factor.
no code implementations • 30 Jan 2024 • Pengyuan Zhou, Lin Wang, Zhi Liu, Yanbin Hao, Pan Hui, Sasu Tarkoma, Jussi Kangasharju
This paper offers an insightful examination of how currently top-trending AI technologies, i. e., generative artificial intelligence (Generative AI) and large language models (LLMs), are reshaping the field of video technology, including video generation, understanding, and streaming.
no code implementations • 2 Jan 2024 • Qinglong Huang, Yong Liao, Yanbin Hao, Pengyuan Zhou
Neural radiance fields (NeRF) have been proposed as an innovative 3D representation method.
no code implementations • 8 Dec 2023 • Fangzhou Song, Bin Zhu, Yanbin Hao, Shuo Wang, Xiangnan He
Learning recipe and food image representation in common embedding space is non-trivial but crucial for cross-modal recipe retrieval.
no code implementations • 18 Nov 2023 • Haoran Li, Long Ma, Yong Liao, Lechao Cheng, Yanbin Hao, Pengyuan Zhou
First, we segment the objects and the background in a multi-object image.
no code implementations • 18 Sep 2023 • Yi Tan, Zhaofan Qiu, Yanbin Hao, Ting Yao, Xiangnan He, Tao Mei
In this paper, we propose a novel video augmentation strategy named Selective Volume Mixup (SV-Mix) to improve the generalization ability of deep models with limited training videos.
1 code implementation • 23 Aug 2023 • Jiarui Yu, Haoran Li, Yanbin Hao, Bin Zhu, Tong Xu, Xiangnan He
Particularly, we use adversarial training to teach CgT-GAN to mimic the phrases of an external text corpus and CLIP-based reward to provide semantic guidance.
1 code implementation • 15 May 2023 • Fangwen Wu, Jingxuan He, Yufei Yin, Yanbin Hao, Gang Huang, Lechao Cheng
This study introduces an efficacious approach, Masked Collaborative Contrast (MCC), to highlight semantic regions in weakly supervised semantic segmentation.
Contrastive Learning Weakly supervised Semantic Segmentation +1
no code implementations • 17 Mar 2023 • Haoran Li, Pengyuan Zhou, Yihang Lin, Yanbin Hao, Haiyong Xie, Yong Liao
Video prediction is a complex time-series forecasting task with great potential in many use cases.
1 code implementation • CVPR 2023 • Zhicai Wang, Yanbin Hao, Tingting Mu, Ouxiang Li, Shuo Wang, Xiangnan He
It is well-known that zero-shot learning (ZSL) can suffer severely from the problem of domain shift, where the true and learned data distributions for the unseen classes do not match.
1 code implementation • CVPR 2023 • Zhenhua Tang, Zhaofan Qiu, Yanbin Hao, Richang Hong, Ting Yao
On this basis, we devise STCFormer by stacking multiple STC blocks and further integrate a new Structure-enhanced Positional Embedding (SPE) into STCFormer to take the structure of human body into consideration.
Ranked #6 on 3D Human Pose Estimation on MPI-INF-3DHP
1 code implementation • 15 Jul 2022 • Zhicai Wang, Yanbin Hao, Xingyu Gao, Hao Zhang, Shuo Wang, Tingting Mu, Xiangnan He
They use token-mixing layers to capture cross-token interactions, as opposed to the multi-head self-attention mechanism used by Transformers.
1 code implementation • 12 Jul 2022 • Hao Zhang, Lechao Cheng, Yanbin Hao, Chong-Wah Ngo
By replacing a vanilla 2D attention with the LAPS, we could adapt a static transformer into a video one, with zero extra parameters and neglectable computation overhead ($\sim$2. 6\%).
1 code implementation • 20 Apr 2022 • Yanbin Hao, Shuo Wang, Pei Cao, Xinjian Gao, Tong Xu, Jinmeng Wu, Xiangnan He
Attention mechanisms have significantly boosted the performance of video classification neural networks thanks to the utilization of perspective contexts.
1 code implementation • CVPR 2022 • Yanbin Hao, Hao Zhang, Chong-Wah Ngo, Xiangnan He
By utilizing calibrators to embed feature with four different kinds of contexts in parallel, the learnt representation is expected to be more resilient to diverse types of activities.
Ranked #3 on Egocentric Activity Recognition on EGTEA
3 code implementations • 5 Aug 2021 • Hao Zhang, Yanbin Hao, Chong-Wah Ngo
It is worth noticing that our TokShift transformer is a pure convolutional-free video transformer pilot with computational efficiency for video understanding.
no code implementations • 17 Mar 2021 • Zhenguang Liu, Kedi Lyu, Shuang Wu, Haipeng Chen, Yanbin Hao, Shouling Ji
Our method is compelling in that it enables manipulable motion prediction across activity types and allows customization of the human movement in a variety of fine-grained ways.
1 code implementation • ICCV 2021 • Zhenguang Liu, Pengxiang Su, Shuang Wu, Xuanjing Shen, Haipeng Chen, Yanbin Hao, Meng Wang
Predicting human motion from a historical pose sequence is at the core of many applications in computer vision.
no code implementations • LREC 2020 • Jinmeng Wu, Yanbin Hao
In addition to the context information captured at each word position, we incorporate a new quantity of context information jump to facilitate the attention weight formulation.