no code implementations • ECCV 2020 • Pei-Pei Li, Huaibo Huang, Yibo Hu, Xiang Wu, Ran He, Zhenan Sun
To explore the age effects on facial images, we propose a Disentangled Adversarial Autoencoder (DAAE) to disentangle the facial images into three independent factors: age, identity and extraneous information.
no code implementations • 28 May 2024 • Haogeng Liu, Quanzeng You, Xiaotian Han, Yongfei Liu, Huaibo Huang, Ran He, Hongxia Yang
In the realm of Multimodal Large Language Models (MLLMs), vision-language connector plays a crucial role to link the pre-trained vision encoders with Large Language Models (LLMs).
no code implementations • 22 May 2024 • Qihang Fan, Huaibo Huang, Mingrui Chen, Ran He
The Vision Transformer (ViT) has gained prominence for its superior relational modeling prowess.
no code implementations • 22 May 2024 • Qihang Fan, Huaibo Huang, Mingrui Chen, Ran He
In recent years, Transformers have achieved remarkable progress in computer vision tasks.
no code implementations • 27 Mar 2024 • Qihang Fan, Quanzeng You, Xiaotian Han, Yongfei Liu, Yunzhe Tao, Huaibo Huang, Ran He, Hongxia Yang
Firstly, we propose a novel module for dynamic resolution adjustment, designed with a single Transformer block, specifically to achieve highly efficient incremental token integration.
no code implementations • 15 Mar 2024 • Nan Gao, Jia Li, Huaibo Huang, Zhi Zeng, Ke Shang, Shuwu Zhang, Ran He
Experimental results demonstrate the superiority of DiffMAC over state-of-the-art methods, with a high degree of generalization in real-world and heterogeneous settings.
no code implementations • 4 Mar 2024 • Xuannan Liu, Peipei Li, Huaibo Huang, Zekun Li, Xing Cui, Jiahao Liang, Lixiong Qin, Weihong Deng, Zhaofeng He
In this paper, we propose FakeNewsGPT4, a novel framework that augments Large Vision-Language Models (LVLMs) with forgery-specific knowledge for manipulation reasoning while inheriting extensive world knowledge as complementary.
no code implementations • 3 Mar 2024 • Haogeng Liu, Quanzeng You, Xiaotian Han, Yiqi Wang, Bohan Zhai, Yongfei Liu, Yunzhe Tao, Huaibo Huang, Ran He, Hongxia Yang
Multimodal Large Language Models (MLLMs) have experienced significant advancements recently.
Ranked #41 on Visual Question Answering on MM-Vet
no code implementations • 5 Dec 2023 • Yuang Ai, Huaibo Huang, Xiaoqiang Zhou, Jiexiang Wang, Ran He
Extensive experiments on 16 IR tasks underscore the superiority of MPerceiver in terms of adaptiveness, generalizability and fidelity.
1 code implementation • 3 Dec 2023 • Jin Liu, Huaibo Huang, Chao Jin, Ran He
Face stylization refers to the transformation of a face into a specific portrait style.
no code implementations • 28 Nov 2023 • Siyu Xing, Jie Cao, Huaibo Huang, Xiao-Yu Zhang, Ran He
First, we propose a coupling strategy to straighten trajectories, creating couplings between image and noise samples under diffusion model guidance.
no code implementations • 25 Nov 2023 • Xing Cui, Zekun Li, Pei Pei Li, Huaibo Huang, Zhaofeng He
We employ DDIM inversion to extract this noise from the reference image and leverage a diffusion model to generate new stylized images from the "style" noise.
no code implementations • 8 Oct 2023 • Tingkai Liu, Yunzhe Tao, Haogeng Liu, Qihang Fan, Ding Zhou, Huaibo Huang, Ran He, Hongxia Yang
We present a novel task and human annotated dataset for evaluating the ability for visual-language models to generate captions and summaries for real-world video clips, which we call Video-CSR (Captioning, Summarization and Retrieval).
no code implementations • 8 Oct 2023 • Haogeng Liu, Qihang Fan, Tingkai Liu, Linjie Yang, Yunzhe Tao, Huaibo Huang, Ran He, Hongxia Yang
This paper proposes Video-Teller, a video-language foundation model that leverages multi-modal fusion and fine-grained modality alignment to significantly enhance the video-to-text generation task.
1 code implementation • 20 Sep 2023 • Qihang Fan, Huaibo Huang, Mingrui Chen, Hongmin Liu, Ran He
To alleviate these issues, we draw inspiration from the recent Retentive Network (RetNet) in the field of NLP, and propose RMT, a strong vision backbone with explicit spatial prior for general purposes.
2 code implementations • NeurIPS 2023 • Rui Wang, Peipei Li, Huaibo Huang, Chunshui Cao, Ran He, Zhaofeng He
Consequently, we propose a cross-modal ordinal pairwise loss to refine the CLIP feature space, where texts and images maintain both semantic alignment and ordering alignment.
1 code implementation • NeurIPS 2023 • Qihang Fan, Huaibo Huang, Xiaoqiang Zhou, Ran He
This paper proposes a Fully Adaptive Self-Attention (FASA) mechanism for vision transformer to model the local and global information as well as the bidirectional interaction between them in context-aware ways.
1 code implementation • 31 Mar 2023 • Qihang Fan, Huaibo Huang, Jiyang Guan, Ran He
The combination of the AttnConv and vanilla attention which uses pooling to reduce FLOPs in CloFormer enables the model to perceive high-frequency and low-frequency information.
Ranked #570 on Image Classification on ImageNet
no code implementations • 31 Mar 2023 • Yuang Ai, Xiaoqiang Zhou, Huaibo Huang, Lei Zhang, Ran He
Unsupervised Domain Adaptation (UDA) can effectively address domain gap issues in real-world image Super-Resolution (SR) by accessing both the source and target data.
no code implementations • ICCV 2023 • Peipei Li, Rui Wang, Huaibo Huang, Ran He, Zhaofeng He
Face aging is an ill-posed problem because multiple plausible aging patterns may correspond to a given input.
no code implementations • ICCV 2023 • Xiaoqiang Zhou, Huaibo Huang, Ran He, Zilei Wang, Jie Hu, Tieniu Tan
In particular, self-attention with cross-scale matching and convolution filters with different kernel sizes are designed to exploit the multi-scale features in images.
1 code implementation • CVPR 2023 • Huaibo Huang, Xiaoqiang Zhou, Jie Cao, Ran He, Tieniu Tan
STA decomposes vanilla global attention into multiplications of a sparse association map and a low-dimensional attention, leading to high efficiency in capturing global dependencies.
1 code implementation • 11 Oct 2022 • Zi Wang, Huaibo Huang, Aihua Zheng, Chenglong Li, Ran He
To alleviate these two issues, we propose a simple yet effective method with Parallel Augmentation and Dual Enhancement (PADE), which is robust on both occluded and non-occluded data and does not require any auxiliary clues.
no code implementations • CVPR 2022 • Gengyun Jia, Huaibo Huang, Chaoyou Fu, Ran He
In this paper, we regard image cropping as a set prediction problem.
1 code implementation • CVPR 2022 • Xin Xie, Yi Li, Huaibo Huang, Haiyan Fu, Wanwan Wang, Yanqing Guo
Style transfer has been well studied in recent years with excellent performance processed.
no code implementations • 20 Dec 2021 • Xin Ma, Xiaoqiang Zhou, Huaibo Huang, Gengyun Jia, Zhenhua Chai, Xiaolin Wei
This multi-scale architecture is beneficial for the decoder to utilize discriminative representations learned from encoders into images.
no code implementations • 20 Oct 2021 • Jianze Wei, Huaibo Huang, Muyi Sun, Yunlong Wang, Min Ren, Ran He, Zhenan Sun
To make further efforts on accurate and reliable iris segmentation, we propose a bilateral self-attention module and design Bilateral Transformer (BiTrans) with hierarchical architecture by exploring spatial and visual relationships.
no code implementations • 4 Oct 2021 • Gege Gao, Huaibo Huang, Chaoyou Fu, Ran He
Human face synthesis involves transferring knowledge about the identity and identity-dependent face shape (IDFS) of a human face to target face images where the context (e. g., facial expressions, head poses, and other background factors) may change dramatically.
no code implementations • 3 Oct 2021 • Jia Li, Huaibo Huang, Xiaofei Jia, Ran He
Blind face restoration (BFR) is a challenging problem because of the uncertainty of the degradation patterns.
no code implementations • 29 Sep 2021 • Chenyu Liu, Jia Li, Junxian Duan, Huaibo Huang
The first is that capturing the general clue of artifacts is difficult.
no code implementations • CVPR 2021 • Huaibo Huang, Aijing Yu, Ran He
To address this issue, we propose a memory-oriented semi-supervised (MOSS) method which enables the network to explore and exploit the properties of rain streaks from both synthetic and real data.
1 code implementation • CVPR 2021 • Gege Gao, Huaibo Huang, Chaoyou Fu, Zhaoyang Li, Ran He
In this work, we propose a novel information disentangling and swapping network, called InfoSwap, to extract the most expressive information for identity representation from a pre-trained face recognition model.
no code implementations • 29 Oct 2020 • Xin Ma, Xiaoqiang Zhou, Huaibo Huang, Zhenhua Chai, Xiaolin Wei, Ran He
It is difficult for encoders to capture such powerful representations under this complex situation.
1 code implementation • 20 Sep 2020 • Chaoyou Fu, Xiang Wu, Yibo Hu, Huaibo Huang, Ran He
As a consequence, massive new diverse paired heterogeneous images with the same identity can be generated from noises.
no code implementations • 20 Apr 2020 • Yi Li, Huaibo Huang, Junchi Yu, Ran He, Tieniu Tan
Face verification aims at determining whether a pair of face images belongs to the same identity.
no code implementations • ECCV 2020 • Jie Cao, Huaibo Huang, Yi Li, Ran He, Zhenan Sun
The performance of multi-domain image-to-image translation has been significantly improved by recent progress in deep generative models.
no code implementations • 21 Dec 2019 • Xin Ma, Yi Li, Huaibo Huang, Mandi Luo, Ran He
Real-world image super-resolution (SR) is a challenging image translation problem.
no code implementations • 17 Dec 2019 • Aijing Yu, Haoxue Wu, Huaibo Huang, Zhen Lei, Ran He
A spectral conditional attention module is introduced to reduce the domain gap between NIR and VIS data and then improve the performance of NIR-VIS heterogeneous face recognition on various databases including the LAMP-HQ.
no code implementations • NeurIPS 2019 • Chaoyou Fu, Xiang Wu, Yibo Hu, Huaibo Huang, Ran He
Specifically, we first introduce a dual variational autoencoder to represent a joint distribution of paired heterogeneous images.
no code implementations • 14 Apr 2019 • Jie Cao, Huaibo Huang, Yi Li, Jingtuo Liu, Ran He, Zhenan Sun
In this work, we present a novel training framework for GANs, namely biphasic learning, to achieve image-to-image translation in multiple visual domains at $1024^2$ resolution.
no code implementations • 30 Mar 2019 • Pei-Pei Li, Huaibo Huang, Yibo Hu, Xiang Wu, Ran He, Zhenan Sun
UVA is the first attempt to achieve facial age analysis tasks, including age translation, age generation and age estimation, in a universal framework.
1 code implementation • 25 Mar 2019 • Chaoyou Fu, Xiang Wu, Yibo Hu, Huaibo Huang, Ran He
Then, in order to ensure the identity consistency of the generated paired heterogeneous images, we impose a distribution alignment in the latent space and a pairwise identity preserving in the image space.
Ranked #1 on Face Verification on CASIA NIR-VIS 2.0
no code implementations • 26 Dec 2018 • Xin Zheng, Yanqing Guo, Huaibo Huang, Yi Li, Ran He
Deep learning based facial attribute analysis consists of two basic sub-issues: facial attribute estimation (FAE), which recognizes whether facial attributes are present in given images, and facial attribute manipulation (FAM), which synthesizes or removes desired facial attributes.
no code implementations • 17 Dec 2018 • Hao Zhu, Huaibo Huang, Yi Li, Aihua Zheng, Ran He
Talking face generation aims to synthesize a face video with precise lip synchronization as well as a smooth transition of facial motion over the entire video via the given speech clip and facial image.
no code implementations • 6 Sep 2018 • Xiang Wu, Huaibo Huang, Vishal M. Patel, Ran He, Zhenan Sun
Visible (VIS) to near infrared (NIR) face matching is a challenging problem due to the significant domain discrepancy between the domains and a lack of sufficient data for training cross-modal matching algorithms.
Ranked #2 on Face Verification on BUAA-VisNir
3 code implementations • NeurIPS 2018 • Huaibo Huang, Zhihang Li, Ran He, Zhenan Sun, Tieniu Tan
On the other hand, the inference model is encouraged to classify between the generated and real samples while the generator tries to fool it as GANs.
no code implementations • 11 Jul 2018 • Huaibo Huang, Lingxiao Song, Ran He, Zhenan Sun, Tieniu Tan
Variational capsules model an image as a composition of entities in a probabilistic model.
no code implementations • ICCV 2017 • Huaibo Huang, Ran He, Zhenan Sun, Tieniu Tan
Most modern face super-resolution methods resort to convolutional neural networks (CNN) to infer high-resolution (HR) face images.
Ranked #3 on Face Hallucination on FFHQ 512 x 512 - 16x upscaling