Search Results for author: Changsheng Xu

Found 94 papers, 50 papers with code

Cluster-Aware Similarity Diffusion for Instance Retrieval

no code implementations • 4 Jun 2024 • Jifei Luo, Hantao Yao, Changsheng Xu

However, existing techniques that construct the affinity graph based on pairwise instances can lead to the propagation of misinformation from outliers and other manifolds, resulting in inaccurate results.

Misinformation Re-Ranking +1

Paper
Add Code

SEP: Self-Enhanced Prompt Tuning for Visual-Language Model

1 code implementation • 24 May 2024 • Hantao Yao, Rui Zhang, Lu Yu, Changsheng Xu

Comprehensive evaluations across various benchmarks and tasks confirm SEP's efficacy in prompt tuning.

Language Modelling

Paper
Code

Libra: Building Decoupled Vision System on Large Language Models

1 code implementation • 16 May 2024 • Yifan Xu, Xiaoshan Yang, Yaguang Song, Changsheng Xu

Specifically, we incorporate a routed visual expert with a cross-modal bridge module into a pretrained LLM to route the vision and language flows during attention computing to enable different attention patterns in inner-modal modeling and cross-modal interaction scenarios.

Language Modelling Large Language Model

Paper
Code

HiVG: Hierarchical Multimodal Fine-grained Modulation for Visual Grounding

2 code implementations • 20 Apr 2024 • Linhui Xiao, Xiaoshan Yang, Fang Peng, YaoWei Wang, Changsheng Xu

Specifically, HiVG consists of a multi-layer adaptive cross-modal bridge and a hierarchical multimodal low-rank adaptation (Hi LoRA) paradigm.

Visual Grounding

Paper
Code

StoryImager: A Unified and Efficient Framework for Coherent Story Visualization and Completion

2 code implementations • 9 Apr 2024 • Ming Tao, Bing-Kun Bao, Hao Tang, YaoWei Wang, Changsheng Xu

3) The story visualization and continuation models are trained and inferred independently, which is not user-friendly.

Image Generation Story Visualization

289

Paper
Code

CreativeSynth: Creative Blending and Synthesis of Visual Arts based on Multimodal Diffusion

1 code implementation • 25 Jan 2024 • Nisha Huang, WeiMing Dong, Yuxin Zhang, Fan Tang, Ronghui Li, Chongyang Ma, Xiu Li, Changsheng Xu

Large-scale text-to-image generative models have made impressive strides, showcasing their ability to synthesize a vast array of high-quality images.

Image Generation Style Transfer

Paper
Code

Hierarchical Prompts for Rehearsal-free Continual Learning

no code implementations • 21 Jan 2024 • Yukun Zuo, Hantao Yao, Lu Yu, Liansheng Zhuang, Changsheng Xu

Nonetheless, these learnable prompts tend to concentrate on the discriminatory knowledge of the current task while ignoring past task knowledge, leading to that learnable prompts still suffering from catastrophic forgetting.

Continual Learning

Paper
Add Code

Hierarchical Augmentation and Distillation for Class Incremental Audio-Visual Video Recognition

1 code implementation • 11 Jan 2024 • Yukun Zuo, Hantao Yao, Liansheng Zhuang, Changsheng Xu

We introduce Hierarchical Augmentation and Distillation (HAD), which comprises the Hierarchical Augmentation Module (HAM) and Hierarchical Distillation Module (HDM) to efficiently utilize the hierarchical structure of data and models, respectively.

Video Recognition

Paper
Code

Three Heads Are Better Than One: Complementary Experts for Long-Tailed Semi-supervised Learning

1 code implementation • 25 Dec 2023 • Chengcheng Ma, Ismail Elezi, Jiankang Deng, WeiMing Dong, Changsheng Xu

For instance, on CIFAR-10-LT, CPE improves test accuracy by over 2. 22% compared to baselines.

Paper
Code

Erasing Self-Supervised Learning Backdoor by Cluster Activation Masking

1 code implementation • 13 Dec 2023 • Shengsheng Qian, Yifei Wang, Dizhan Xue, Shengjie Zhang, Huaiwen Zhang, Changsheng Xu

After obtaining the threat model trained on the poisoned dataset, our method can precisely detect poisonous samples based on the assumption that masking the backdoor trigger can effectively change the activation of a downstream clustering model.

backdoor defense Self-Supervised Learning

Paper
Code

MotionCrafter: One-Shot Motion Customization of Diffusion Models

1 code implementation • 8 Dec 2023 • Yuxin Zhang, Fan Tang, Nisha Huang, Haibin Huang, Chongyang Ma, WeiMing Dong, Changsheng Xu

The essence of a video lies in its dynamic motions, including character actions, object movements, and camera movements.

Disentanglement Motion Disentanglement +3

Paper
Code

TCP:Textual-based Class-aware Prompt tuning for Visual-Language Model

1 code implementation • 30 Nov 2023 • Hantao Yao, Rui Zhang, Changsheng Xu

However, those textual tokens have a limited generalization ability regarding unseen domains, as they cannot dynamically adjust to the distribution of testing classes.

Language Modelling

Paper
Code

Fast-Slow Test-Time Adaptation for Online Vision-and-Language Navigation

1 code implementation • 22 Nov 2023 • Junyu Gao, Xuan Yao, Changsheng Xu

Such agents are typically required to execute user instructions in an online manner, leading us to explore the use of unlabeled test samples for effective online model adaptation.

Navigate Test-time Adaptation +1

Paper
Code

Learning Transferable Conceptual Prototypes for Interpretable Unsupervised Domain Adaptation

no code implementations • 12 Oct 2023 • Junyu Gao, Xinhong Ma, Changsheng Xu

Despite the great progress of unsupervised domain adaptation (UDA) with the deep neural networks, current UDA models are opaque and cannot provide promising explanations, limiting their applications in the scenarios that require safe and controllable model decisions.

Decision Making Pseudo Label +2

Paper
Add Code

A Survey on Interpretable Cross-modal Reasoning

1 code implementation • 5 Sep 2023 • Dizhan Xue, Shengsheng Qian, Zuyi Zhou, Changsheng Xu

In recent years, cross-modal reasoning (CMR), the process of understanding and reasoning across different modalities, has emerged as a pivotal area with applications spanning from multimedia analysis to healthcare diagnostics.

Cross-Modal Retrieval Decision Making +8

Paper
Code

Exploring Multi-Modal Contextual Knowledge for Open-Vocabulary Object Detection

no code implementations • 30 Aug 2023 • Yifan Xu, Mengdan Zhang, Xiaoshan Yang, Changsheng Xu

In this paper, we for the first time explore helpful multi-modal contextual knowledge to understand novel categories for open-vocabulary object detection (OVD).

Knowledge Distillation Language Modelling +4

Paper
Add Code

Introducing Foundation Models as Surrogate Models: Advancing Towards More Practical Adversarial Attacks

no code implementations • 13 Jul 2023 • Jiaming Zhang, Jitao Sang, Qi Yi, Changsheng Xu

Harnessing the concept of non-robust features, we elaborate on two guiding principles for surrogate model selection to explain why the foundational model is an optimal choice for this role.

Adversarial Attack Attribute +1

Paper
Add Code

Multimodal Imbalance-Aware Gradient Modulation for Weakly-supervised Audio-Visual Video Parsing

no code implementations • 5 Jul 2023 • Jie Fu, Junyu Gao, Changsheng Xu

In this paper, to balance the feature learning processes of different modalities, a dynamic gradient modulation (DGM) mechanism is explored, where a novel and effective metric function is designed to measure the imbalanced feature learning between audio and visual modalities.

Paper
Add Code

Multi-modal Queried Object Detection in the Wild

1 code implementation • NeurIPS 2023 • Yifan Xu, Mengdan Zhang, Chaoyou Fu, Peixian Chen, Xiaoshan Yang, Ke Li, Changsheng Xu

To address the learning inertia problem brought by the frozen detector, a vision conditioned masked language prediction strategy is proposed.

Ranked #2 on Few-Shot Object Detection on ODinW-35

Few-Shot Object Detection Object +2

238

Paper
Code

Camera-Incremental Object Re-Identification with Identity Knowledge Evolution

1 code implementation • 25 May 2023 • Hantao Yao, Lu Yu, Jifei Luo, Changsheng Xu

In this paper, we propose a novel Identity Knowledge Evolution (IKE) framework for CIOR, consisting of the Identity Knowledge Association (IKA), Identity Knowledge Distillation (IKD), and Identity Knowledge Update (IKU).

Knowledge Distillation Object

Paper
Code

ProSpect: Prompt Spectrum for Attribute-Aware Personalization of Diffusion Models

3 code implementations • 25 May 2023 • Yuxin Zhang, WeiMing Dong, Fan Tang, Nisha Huang, Haibin Huang, Chongyang Ma, Tong-Yee Lee, Oliver Deussen, Changsheng Xu

We apply ProSpect in various personalized attribute-aware image generation applications, such as image-guided or text-driven manipulations of materials, style, and layout, achieving previously unattainable results from a single image input without fine-tuning the diffusion models.

Attribute Disentanglement +1

478

Paper
Code

CLIP-VG: Self-paced Curriculum Adapting of CLIP for Visual Grounding

2 code implementations • 15 May 2023 • Linhui Xiao, Xiaoshan Yang, Fang Peng, Ming Yan, YaoWei Wang, Changsheng Xu

In order to utilize vision and language pre-trained models to address the grounding problem, and reasonably take advantage of pseudo-labels, we propose CLIP-VG, a novel method that can conduct self-paced curriculum adapting of CLIP with pseudo-language labels.

Transfer Learning Visual Grounding

Paper
Code

Open-World Social Event Classification

no code implementations • WWW 2023 • Shengsheng Qian, Hong Chen, Dizhan Xue, Quan Fang, Changsheng Xu

To tackle these challenges, we propose an Open-World Social Event Classifier (OWSEC) model in this paper.

Classification Open-World Social Event Classification

Paper
Add Code

Visual-Language Prompt Tuning with Knowledge-guided Context Optimization

1 code implementation • CVPR 2023 • Hantao Yao, Rui Zhang, Changsheng Xu

Representative CoOp-based work combines the learnable textual tokens with the class tokens to obtain specific textual knowledge.

Language Modelling

Paper
Code

A Unified Arbitrary Style Transfer Framework via Adaptive Contrastive Learning

1 code implementation • 9 Mar 2023 • Yuxin Zhang, Fan Tang, WeiMing Dong, Haibin Huang, Chongyang Ma, Tong-Yee Lee, Changsheng Xu

Our framework consists of three key components, i. e., a parallel contrastive learning scheme for style representation and style transfer, a domain enhancement module for effective learning of style distribution, and a generative network for style transfer.

Contrastive Learning Representation Learning +1

164

Paper
Code

Backdoor for Debias: Mitigating Model Bias with Backdoor Attack-based Artificial Bias

no code implementations • 1 Mar 2023 • Shangxi Wu, Qiuyang He, Fangzhao Wu, Jitao Sang, YaoWei Wang, Changsheng Xu

In this work, we found that the backdoor attack can construct an artificial bias similar to the model bias derived in standard training.

Backdoor Attack Knowledge Distillation

Paper
Add Code

Region-Aware Diffusion for Zero-shot Text-driven Image Editing

1 code implementation • 23 Feb 2023 • Nisha Huang, Fan Tang, WeiMing Dong, Tong-Yee Lee, Changsheng Xu

Different from current mask-based image editing methods, we propose a novel region-aware diffusion model (RDM) for entity-level image editing, which could automatically locate the region of interest and replace it following given text prompts.

Image Manipulation

167

Paper
Code

GALIP: Generative Adversarial CLIPs for Text-to-Image Synthesis

2 code implementations • CVPR 2023 • Ming Tao, Bing-Kun Bao, Hao Tang, Changsheng Xu

The complex scene understanding ability of CLIP enables the discriminator to accurately assess the image quality.

Ranked #3 on Text-to-Image Generation on CUB

Scene Understanding Text-to-Image Generation

287

Paper
Code

Collecting Cross-Modal Presence-Absence Evidence for Weakly-Supervised Audio-Visual Event Perception

1 code implementation • CVPR 2023 • Junyu Gao, Mengyuan Chen, Changsheng Xu

We argue that, for an event residing in one modality, the modality itself should provide ample presence evidence of this event, while the other complementary modality is encouraged to afford the absence evidence as a reference signal.

Paper
Code

Variational Causal Inference Network for Explanatory Visual Question Answering

1 code implementation • ICCV 2023 • Dizhan Xue, Shengsheng Qian, Changsheng Xu

To address these issues, we propose a Variational Causal Inference Network (VCIN) that establishes the causal correlation between predicted answers and explanations, and captures cross-modal relationships to generate rational explanations.

Ranked #1 on Explanatory Visual Question Answering on GQA-REX

Explanation Generation Explanatory Visual Question Answering +2

Paper
Code

UTM: A Unified Multiple Object Tracking Model With Identity-Aware Feature Enhancement

no code implementations • CVPR 2023 • Sisi You, Hantao Yao, Bing-Kun Bao, Changsheng Xu

Recently, Multiple Object Tracking has achieved great success, which consists of object detection, feature embedding, and identity association.

Multiple Object Tracking object-detection +1

Paper
Add Code

Cascade Evidential Learning for Open-World Weakly-Supervised Temporal Action Localization

no code implementations • CVPR 2023 • Mengyuan Chen, Junyu Gao, Changsheng Xu

Targeting at recognizing and localizing action instances with only video-level labels during training, Weakly-supervised Temporal Action Localization (WTAL) has achieved significant progress in recent years.

Open Set Learning Weakly-supervised Temporal Action Localization +1

Paper
Add Code

VQACL: A Novel Visual Question Answering Continual Learning Setting

1 code implementation • CVPR 2023 • Xi Zhang, Feifei Zhang, Changsheng Xu

Research on continual learning has recently led to a variety of work in unimodal community, however little attention has been paid to multimodal tasks like visual question answering (VQA).

Continual Learning Question Answering +2

Paper
Code

Active Exploration of Multimodal Complementarity for Few-Shot Action Recognition

no code implementations • CVPR 2023 • Yuyang Wanyan, Xiaoshan Yang, Chaofan Chen, Changsheng Xu

In meta-training, we design an Active Sample Selection (ASS) module to organize query samples with large differences in the reliability of modalities into different groups based on modality-specific posterior distributions.

Few-Shot action recognition Few Shot Action Recognition +2

Paper
Add Code

Unlearnable Clusters: Towards Label-agnostic Unlearnable Examples

1 code implementation • CVPR 2023 • Jiaming Zhang, Xingjun Ma, Qi Yi, Jitao Sang, Yu-Gang Jiang, YaoWei Wang, Changsheng Xu

Furthermore, we propose to leverage VisionandLanguage Pre-trained Models (VLPMs) like CLIP as the surrogate model to improve the transferability of the crafted UCs to diverse domains.

Data Poisoning

Paper
Code

SgVA-CLIP: Semantic-guided Visual Adapting of Vision-Language Models for Few-shot Image Classification

no code implementations • 28 Nov 2022 • Fang Peng, Xiaoshan Yang, Linhui Xiao, YaoWei Wang, Changsheng Xu

Although significant progress has been made in few-shot learning, most of existing few-shot image classification methods require supervised pre-training on a large amount of samples of base classes, which limits their generalization ability in real world application.

Few-Shot Image Classification Few-Shot Learning +2

Paper
Add Code

Inversion-Based Style Transfer with Diffusion Models

1 code implementation • CVPR 2023 • Yuxin Zhang, Nisha Huang, Fan Tang, Haibin Huang, Chongyang Ma, WeiMing Dong, Changsheng Xu

Our key idea is to learn artistic style directly from a single painting and then guide the synthesis without providing complex textual descriptions.

Denoising Style Transfer +1

478

Paper
Code

DiffStyler: Controllable Dual Diffusion for Text-Driven Image Stylization

1 code implementation • 19 Nov 2022 • Nisha Huang, Yuxin Zhang, Fan Tang, Chongyang Ma, Haibin Huang, Yong Zhang, WeiMing Dong, Changsheng Xu

Despite the impressive results of arbitrary image-guided style transfer methods, text-driven image stylization has recently been proposed for transferring a natural image into a stylized one according to textual descriptions of the target style provided by the user.

Denoising Image Stylization

122

Paper
Code

Understanding and Mitigating Overfitting in Prompt Tuning for Vision-Language Models

1 code implementation • 4 Nov 2022 • Chengcheng Ma, Yang Liu, Jiankang Deng, Lingxi Xie, WeiMing Dong, Changsheng Xu

Pretrained vision-language models (VLMs) such as CLIP have shown impressive generalization capability in downstream vision tasks with appropriate text prompts.

object-detection Open Vocabulary Object Detection +2

Paper
Code

MMT: Image-guided Story Ending Generation with Multimodal Memory Transformer

1 code implementation • ACM MM 2022 • Dizhan Xue, Shengsheng Qian, Quan Fang, Changsheng Xu

Finally, a multimodal transformer decoder constructs attention among multimodal features to learn the story dependency and generates informative, reasonable, and coherent story endings.

Ranked #1 on Image-guided Story Ending Generation on LSMDC-E

Decoder Image Captioning +3

Paper
Code

Draw Your Art Dream: Diverse Digital Art Synthesis with Multimodal Guided Diffusion

1 code implementation • 27 Sep 2022 • Nisha Huang, Fan Tang, WeiMing Dong, Changsheng Xu

Extensive experimental results on the quality and quantity of the generated digital art paintings confirm the effectiveness of the combination of the diffusion model and multimodal guidance.

Paper
Code

Integrating multi-label contrastive learning with dual adversarial graph neural networks for cross-modal retrieval

1 code implementation • IEEE Transactions on Pattern Analysis and Machine Intelligence 2022 • Dizhan Xue, Shengsheng Qian, Quan Fang, Changsheng Xu

To date, most of the existing techniques mainly convert multimodal data into a common representation space where similarities in semantics between samples can be easily measured across multiple modalities.

Contrastive Learning Cross-Modal Retrieval +1

Paper
Code

Learning Muti-expert Distribution Calibration for Long-tailed Video Classification

no code implementations • 22 May 2022 • Yufan Hu, Junyu Gao, Changsheng Xu

Most existing state-of-the-art video classification methods assume that the training data obey a uniform distribution.

Classification Image Classification +1

Paper
Add Code

Domain Enhanced Arbitrary Image Style Transfer via Contrastive Learning

1 code implementation • 19 May 2022 • Yuxin Zhang, Fan Tang, WeiMing Dong, Haibin Huang, Chongyang Ma, Tong-Yee Lee, Changsheng Xu

Our framework consists of three key components, i. e., a multi-layer style projector for style code encoding, a domain enhancement module for effective learning of style distribution, and a generative network for image style transfer.

Contrastive Learning Image Stylization +1

164

Paper
Code

MGDCF: Distance Learning via Markov Graph Diffusion for Neural Collaborative Filtering

2 code implementations • 5 Apr 2022 • Jun Hu, Bryan Hooi, Shengsheng Qian, Quan Fang, Changsheng Xu

Based on a Markov process that trades off two types of distances, we present Markov Graph Diffusion Collaborative Filtering (MGDCF) to generalize some state-of-the-art GNN-based CF models.

Ranked #4 on Recommendation Systems on Gowalla

Collaborative Filtering Recommendation Systems +1

Paper
Code

Learning Commonsense-aware Moment-Text Alignment for Fast Video Temporal Grounding

1 code implementation • 4 Apr 2022 • Ziyue Wu, Junyu Gao, Shucheng Huang, Changsheng Xu

Then, a commonsense-aware interaction module is designed to obtain bridged visual and text features by utilizing the learned commonsense concepts.

Natural Language Queries

Paper
Code

Fine-grained Temporal Contrastive Learning for Weakly-supervised Temporal Action Localization

1 code implementation • CVPR 2022 • Junyu Gao, Mengyuan Chen, Changsheng Xu

We target at the task of weakly-supervised action localization (WSAL), where only video-level action labels are available during model training.

Classification Contrastive Learning +4

Paper
Code

CrossRectify: Leveraging Disagreement for Semi-supervised Object Detection

1 code implementation • 26 Jan 2022 • Chengcheng Ma, Xingjia Pan, Qixiang Ye, Fan Tang, WeiMing Dong, Changsheng Xu

Semi-supervised object detection has recently achieved substantial progress.

Object object-detection +3

Paper
Code

Dynamic Scene Graph Generation via Anticipatory Pre-Training

no code implementations • CVPR 2022 • Yiming Li, Xiaoshan Yang, Changsheng Xu

Humans can not only see the collection of objects in visual scenes, but also identify the relationship between objects.

Decoder Graph Generation +1

Paper
Add Code

StyTr2: Image Style Transfer With Transformers

3 code implementations • CVPR 2022 • Yingying Deng, Fan Tang, WeiMing Dong, Chongyang Ma, Xingjia Pan, Lei Wang, Changsheng Xu

The goal of image style transfer is to render an image with artistic features guided by a style reference while maintaining the original content.

Decoder Style Transfer

331

Paper
Code

Dual Cluster Contrastive learning for Object Re-Identification

1 code implementation • 9 Dec 2021 • Hantao Yao, Changsheng Xu

Unlike the individual-based updating mechanism, the centroid-based updating mechanism that applies the mean feature of each cluster to update the cluster memory can reduce the impact of individual samples.

Ranked #50 on Person Re-Identification on Market-1501

Contrastive Learning Object +1

Paper
Code

SSAGCN: Social Soft Attention Graph Convolution Network for Pedestrian Trajectory Prediction

no code implementations • 5 Dec 2021 • Pei Lv, Wentong Wang, Yunxin Wang, Yuzhen Zhang, Mingliang Xu, Changsheng Xu

In detail, when modeling social interaction, we propose a new \emph{social soft attention function}, which fully considers various interaction factors among pedestrians.

Autonomous Driving Pedestrian Trajectory Prediction +1

Paper
Add Code

Contrastive Adaptive Propagation Graph Neural Networks for Efficient Graph Learning

1 code implementation • 2 Dec 2021 • Jun Hu, Shengsheng Qian, Quan Fang, Changsheng Xu

Recently the field has advanced from local propagation schemes that focus on local neighbors towards extended propagation schemes that can directly deal with extended neighbors consisting of both local and high-order neighbors.

Graph Learning Self-Supervised Learning

Paper
Code

Weakly-Supervised Video Object Grounding via Causal Intervention

no code implementations • 1 Dec 2021 • Wei Wang, Junyu Gao, Changsheng Xu

With this in mind, we design a unified causal framework to learn the deconfounded object-relevant association for more accurate and robust video object grounding.

Contrastive Learning Object +1

Paper
Add Code

GRecX: An Efficient and Unified Benchmark for GNN-based Recommendation

1 code implementation • 19 Nov 2021 • Desheng Cai, Jun Hu, Quan Zhao, Shengsheng Qian, Quan Fang, Changsheng Xu

In this paper, we present GRecX, an open-source TensorFlow framework for benchmarking GNN-based recommendation models in an efficient and unified way.

Benchmarking Management

Paper
Code

Learning to Learn a Cold-start Sequential Recommender

no code implementations • 18 Oct 2021 • Xiaowen Huang, Jitao Sang, Jian Yu, Changsheng Xu

The cold-start recommendation is an urgent problem in contemporary online applications.

Meta-Learning Sequential Recommendation

Paper
Add Code

Towards Predictable Feature Attribution: Revisiting and Improving Guided BackPropagation

no code implementations • 29 Sep 2021 • Guanhua Zheng, Jitao Sang, Wang Haonan, Changsheng Xu

Recently, backpropagation(BP)-based feature attribution methods have been widely adopted to interpret the internal mechanisms of convolutional neural networks (CNNs), and expected to be human-understandable (lucidity) and faithful to decision-making processes (fidelity).

Decision Making

Paper
Add Code

Adaptive label-aware graph convolutional networks for cross-modal retrieval

1 code implementation • IEEE Transactions on Multimedia 2021 • Shengsheng Qian, Dizhan Xue, Quan Fang, Changsheng Xu

Firstly, we construct an instance representation learning branch to transform instances of different modalities into a common representation space.

Cross-Modal Retrieval Representation Learning +1

Paper
Code

Evo-ViT: Slow-Fast Token Evolution for Dynamic Vision Transformer

1 code implementation • 3 Aug 2021 • Yifan Xu, Zhijie Zhang, Mengdan Zhang, Kekai Sheng, Ke Li, WeiMing Dong, Liqing Zhang, Changsheng Xu, Xing Sun

Vision transformers (ViTs) have recently received explosive popularity, but the huge computational cost is still a severe issue.

Ranked #11 on Efficient ViTs on ImageNet-1K (with DeiT-T)

Efficient ViTs

Paper
Code

DualVGR: A Dual-Visual Graph Reasoning Unit for Video Question Answering

1 code implementation • 10 Jul 2021 • Jianyu Wang, Bing-Kun Bao, Changsheng Xu

However, existing graph-based methods fail to perform multi-step reasoning well, neglecting two properties of VideoQA: (1) Even for the same video, different questions may require different amount of video clips or objects to infer the answer with relational reasoning; (2) During reasoning, appearance and motion features have complicated interdependence which are correlated and complementary to each other.

Ranked #29 on Visual Question Answering (VQA) on MSRVTT-QA

Graph Attention Question Answering +3

Paper
Code

ECKPN: Explicit Class Knowledge Propagation Network for Transductive Few-shot Learning

no code implementations • CVPR 2021 • Chaofan Chen, Xiaoshan Yang, Changsheng Xu, Xuhui Huang, Zhe Ma

Specifically, we first employ the comparison module to explore the pairwise sample relations to learn rich sample representations in the instance-level graph.

Few-Shot Learning

Paper
Add Code

User-Guided Personalized Image Aesthetic Assessment based on Deep Reinforcement Learning

no code implementations • 14 Jun 2021 • Pei Lv, Jianqi Fan, Xixi Nie, WeiMing Dong, Xiaoheng Jiang, Bing Zhou, Mingliang Xu, Changsheng Xu

This framework leverages user interactions to retouch and rank images for aesthetic assessment based on deep reinforcement learning (DRL), and generates personalized aesthetic distribution that is more in line with the aesthetic preferences of different users.

Image Enhancement reinforcement-learning +1

Paper
Add Code

StyTr$^2$: Image Style Transfer with Transformers

4 code implementations • 30 May 2021 • Yingying Deng, Fan Tang, WeiMing Dong, Chongyang Ma, Xingjia Pan, Lei Wang, Changsheng Xu

The goal of image style transfer is to render an image with artistic features guided by a style reference while maintaining the original content.

Decoder Style Transfer

331

Paper
Code

Dual adversarial graph neural networks for multi-label cross-modal retrieval

1 code implementation • AAAI 2021 • Shengsheng Qian, Dizhan Xue, Huaiwen Zhang, Quan Fang, Changsheng Xu

To date, most existing methods transform multimodal data into a common representation space where semantic similarities between items can be directly measured across different modalities.

Cross-Modal Retrieval Retrieval

Paper
Code

Towards Corruption-Agnostic Robust Domain Adaptation

no code implementations • 21 Apr 2021 • Yifan Xu, Kekai Sheng, WeiMing Dong, Baoyuan Wu, Changsheng Xu, Bao-Gang Hu

However, due to unpredictable corruptions (e. g., noise and blur) in real data like web images, domain adaptation methods are increasingly required to be corruption robust on target domains.

Domain Adaptation

Paper
Add Code

Health Status Prediction with Local-Global Heterogeneous Behavior Graph

no code implementations • 23 Mar 2021 • Xuan Ma, Xiaoshan Yang, Junyu Gao, Changsheng Xu

However, these data streams are multi-source and heterogeneous, containing complex temporal structures with local contextual and global temporal aspects, which makes the feature learning and data joint utilization challenging.

Management

Paper
Add Code

Unveiling the Potential of Structure Preserving for Weakly Supervised Object Localization

1 code implementation • CVPR 2021 • Xingjia Pan, Yingguo Gao, Zhiwen Lin, Fan Tang, WeiMing Dong, Haolei Yuan, Feiyue Huang, Changsheng Xu

Weakly supervised object localization(WSOL) remains an open problem given the deficiency of finding object extent information using a classification network.

Classification General Classification +3

Paper
Code

Efficient Graph Deep Learning in TensorFlow with tf_geometric

1 code implementation • 27 Jan 2021 • Jun Hu, Shengsheng Qian, Quan Fang, Youze Wang, Quan Zhao, Huaiwen Zhang, Changsheng Xu

We introduce tf_geometric, an efficient and friendly library for graph deep learning, which is compatible with both TensorFlow 1. x and 2. x.

General Classification Graph Classification +5

465

Paper
Code

Fast Video Moment Retrieval

no code implementations • ICCV 2021 • Junyu Gao, Changsheng Xu

To tackle this issue, we replace the cross-modal interaction module with a cross-modal common space, in which moment-query alignment is learned and efficient moment search can be performed.

Moment Retrieval Retrieval +1

Paper
Add Code

Active Universal Domain Adaptation

no code implementations • ICCV 2021 • Xinhong Ma, Junyu Gao, Changsheng Xu

This paper proposes a new paradigm for unsupervised domain adaptation, termed as Active Universal Domain Adaptation (AUDA), which removes all label set assumptions and aims for not only recognizing target samples from source classes but also inferring those from target-private classes by using active learning to annotate a small budget of target data.

Active Learning Universal Domain Adaptation +1

Paper
Add Code

Effective Label Propagation for Discriminative Semi-Supervised Domain Adaptation

no code implementations • 4 Dec 2020 • Zhiyong Huang, Kekai Sheng, WeiMing Dong, Xing Mei, Chongyang Ma, Feiyue Huang, Dengwen Zhou, Changsheng Xu

For intra-domain propagation, we propose an effective self-training strategy to mitigate the noises in pseudo-labeled target domain data and improve the feature discriminability in the target domain.

Domain Adaptation Image Classification +1

Paper
Add Code

Arbitrary Video Style Transfer via Multi-Channel Correlation

no code implementations • 17 Sep 2020 • Yingying Deng, Fan Tang, Wei-Ming Dong, Haibin Huang, Chongyang Ma, Changsheng Xu

Towards this end, we propose Multi-Channel Correction network (MCCNet), which can be trained to fuse the exemplar style features and input content features for efficient style transfer while naturally maintaining the coherence of input videos.

Style Transfer Video Style Transfer

Paper
Add Code

DF-GAN: A Simple and Effective Baseline for Text-to-Image Synthesis

3 code implementations • CVPR 2022 • Ming Tao, Hao Tang, Fei Wu, Xiao-Yuan Jing, Bing-Kun Bao, Changsheng Xu

To these ends, we propose a simpler but more effective Deep Fusion Generative Adversarial Networks (DF-GAN).

Ranked #4 on Text-to-Image Generation on CUB (Inception score metric)

Text Matching Text-to-Image Generation

287

Paper
Code

MMCGAN: Generative Adversarial Network with Explicit Manifold Prior

no code implementations • 18 Jun 2020 • Guanhua Zheng, Jitao Sang, Changsheng Xu

Since the basic assumption of conventional manifold learning fails in case of sparse and uneven data distribution, we introduce a new target, Minimum Manifold Coding (MMC), for manifold learning to encourage simple and unfolded manifold.

Generative Adversarial Network

Paper
Add Code

Distribution Aligned Multimodal and Multi-Domain Image Stylization

no code implementations • 2 Jun 2020 • Minxuan Lin, Fan Tang, Wei-Ming Dong, Xiao Li, Chongyang Ma, Changsheng Xu

Currently, there are few methods that can perform both multimodal and multi-domain stylization simultaneously.

Image Stylization

Paper
Add Code

Attribute-Induced Bias Eliminating for Transductive Zero-Shot Learning

no code implementations • 31 May 2020 • Hantao Yao, Shaobo Min, Yongdong Zhang, Changsheng Xu

Then, an attentional graph attribute embedding is proposed to reduce the semantic bias between seen and unseen categories, which utilizes the graph operation to capture the semantic relationship between categories.

Attribute Transfer Learning +1

Paper
Add Code

Joint Person Objectness and Repulsion for Person Search

no code implementations • 30 May 2020 • Hantao Yao, Changsheng Xu

Based on this repulsion constraint, the repulsion term is proposed to reduce the similarity of distractor images that are not most similar to the probe person.

Human Detection Person Search

Paper
Add Code

Arbitrary Style Transfer via Multi-Adaptation Network

2 code implementations • 27 May 2020 • Yingying Deng, Fan Tang, Wei-Ming Dong, Wen Sun, Feiyue Huang, Changsheng Xu

Arbitrary style transfer is a significant topic with research value and application prospect.

Disentanglement Style Transfer

Paper
Code

Adaptive Adversarial Logits Pairing

no code implementations • 25 May 2020 • Shangxi Wu, Jitao Sang, Kaiyuan Xu, Guanhua Zheng, Changsheng Xu

Specifically, AALP consists of an adaptive feature optimization module with Guided Dropout to systematically pursue fewer high-contribution features, and an adaptive sample weighting module by setting sample-specific training weights to balance between logits pairing loss and classification loss.

Classification General Classification +1

Paper
Add Code

Dynamic Refinement Network for Oriented and Densely Packed Object Detection

1 code implementation • CVPR 2020 • Xingjia Pan, Yuqiang Ren, Kekai Sheng, Wei-Ming Dong, Haolei Yuan, Xiaowei Guo, Chongyang Ma, Changsheng Xu

However, the detection of oriented and densely packed objects remains challenging because of following inherent reasons: (1) receptive fields of neurons are all axis-aligned and of the same shape, whereas objects are usually of diverse shapes and align along various directions; (2) detection models are typically trained with generic knowledge and may not generalize well to handle specific objects at test time; (3) the limited dataset hinders the development on this task.

Ranked #3 on One-stage Anchor-free Oriented Object Detection on SKU110K-R

feature selection object-detection +2

326

Paper
Code

Multi-Attribute Guided Painting Generation

no code implementations • 26 Feb 2020 • Minxuan Lin, Yingying Deng, Fan Tang, Wei-Ming Dong, Changsheng Xu

Controllable painting generation plays a pivotal role in image stylization.

Attribute Image Stylization +1

Paper
Add Code

Time-Guided High-Order Attention Model of Longitudinal Heterogeneous Healthcare Data

no code implementations • 28 Nov 2019 • Yi Huang, Xiaoshan Yang, Changsheng Xu

(1) It can model longitudinal heterogeneous EHRs data via capturing the 3-order correlations of different modalities and the irregular temporal impact of historical events.

Management Mortality Prediction +1

Paper
Add Code

A Generalization Theory based on Independent and Task-Identically Distributed Assumption

no code implementations • 28 Nov 2019 • Guanhua Zheng, Jitao Sang, Houqiang Li, Jian Yu, Changsheng Xu

The derived generalization bound based on the ITID assumption identifies the significance of hypothesis invariance in guaranteeing generalization performance.

Image Classification

Paper
Add Code

I Know the Relationships: Zero-Shot Action Recognition via Two-Stream Graph Convolutional Networks and Knowledge Graphs

1 code implementation • Proceedings of the AAAI Conference on Artificial Intelligence 2019 • Junyu Gao, Tianzhu Zhang, Changsheng Xu

To effectively leverage the knowledge graph, we design a novel Two-Stream Graph Convolutional Network (TS-GCN) consisting of a classifier branch and an instance branch.

Ranked #5 on Zero-Shot Action Recognition on Olympics

Action Recognition Attribute +2

140

Paper
Code

Adversarial Multimodal Network for Movie Question Answering

no code implementations • 24 Jun 2019 • Zhaoquan Yuan, Siyuan Sun, Lixin Duan, Xiao Wu, Changsheng Xu

In AMN, as inspired by generative adversarial networks, we propose to learn multimodal feature representations by finding a more coherent subspace for video clips and the corresponding texts (e. g., subtitles and questions).

Question Answering Video Question Answering +1

Paper
Add Code

Exploring Feature Representation and Training strategies in Temporal Action Localization

no code implementations • 25 May 2019 • Ting-Ting Xie, Xiaoshan Yang, Tianzhu Zhang, Changsheng Xu, Ioannis Patras

Temporal action localization has recently attracted significant interest in the Computer Vision community.

Temporal Action Localization

Paper
Add Code

Joint Pose and Expression Modeling for Facial Expression Recognition

no code implementations • CVPR 2018 • Feifei Zhang, Tianzhu Zhang, Qirong Mao, Changsheng Xu

First, the encoder-decoder structure of the generator can learn a generative and discriminative identity representation for face images.

Decoder Facial Expression Recognition +3

Paper
Add Code

Depth Information Guided Crowd Counting for Complex Crowd Scenes

no code implementations • 3 Mar 2018 • Mingliang Xu, Zhaoyang Ge, Xiaoheng Jiang, Gaoge Cui, Pei Lv, Bing Zhou, Changsheng Xu

DigCrowd first uses the depth information of an image to segment the scene into a far-view region and a near-view region.

Crowd Counting

Paper
Add Code

Understanding Deep Learning Generalization by Maximum Entropy

no code implementations • ICLR 2018 • Guanhua Zheng, Jitao Sang, Changsheng Xu

DNN is then regarded as approximating the feature conditions with multilayer feature learning, and proved to be a recursive solution towards maximum entropy principle.

regression

Paper
Add Code

Multi-Task Correlation Particle Filter for Robust Object Tracking

no code implementations • CVPR 2017 • Tianzhu Zhang, Changsheng Xu, Ming-Hsuan Yang

In this paper, we propose a multi-task correlation particle filter (MCPF) for robust visual tracking.

Object Object Tracking +1

Paper
Add Code

Structural Correlation Filter for Robust Visual Tracking

no code implementations • CVPR 2016 • Si Liu, Tianzhu Zhang, Xiaochun Cao, Changsheng Xu

In this paper, we propose a novel structural correlation filter (SCF) model for robust visual tracking.

Computational Efficiency Visual Tracking

Paper
Add Code

Structural Sparse Tracking

no code implementations • CVPR 2015 • Tianzhu Zhang, Si Liu, Changsheng Xu, Shuicheng Yan, Bernard Ghanem, Narendra Ahuja, Ming-Hsuan Yang

Sparse representation has been applied to visual tracking by finding the best target candidate with minimal reconstruction error by use of target templates.

Visual Tracking

Paper
Add Code

Matching-CNN Meets KNN: Quasi-Parametric Human Parsing

no code implementations • CVPR 2015 • Si Liu, Xiaodan Liang, Luoqi Liu, Xiaohui Shen, Jianchao Yang, Changsheng Xu, Liang Lin, Xiaochun Cao, Shuicheng Yan

Under the classic K Nearest Neighbor (KNN)-based nonparametric framework, the parametric Matching Convolutional Neural Network (M-CNN) is proposed to predict the matching confidence and displacements of the best matched region in the testing image for a particular semantic region in one KNN image.

Human Parsing

Paper
Add Code

Partial Occlusion Handling for Visual Tracking via Robust Part Matching

no code implementations • CVPR 2014 • Tianzhu Zhang, Kui Jia, Changsheng Xu, Yi Ma, Narendra Ahuja

The proposed part matching tracker (PMT) has a number of attractive properties.

Occlusion Handling Sparse Learning +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.