Search Results for author: Chenliang Xu

Found 85 papers, 36 papers with code

Monocular 3D Object Detection via Feature Domain Adaptation

no code implementations • ECCV 2020 • Lele Chen, Guofeng Cui, Celong Liu, Zhong Li, Ziyi Kou, Yi Xu, Chenliang Xu

Monocular 3D object detection is a challenging task due to unreliable depth, resulting in a distinct performance gap between monocular and LiDAR-based approaches.

Domain Adaptation Foreground Segmentation +3

Paper
Add Code

V2Xum-LLM: Cross-Modal Video Summarization with Temporal Prompt Instruction Tuning

no code implementations • 18 Apr 2024 • Hang Hua, Yunlong Tang, Chenliang Xu, Jiebo Luo

Recent efforts have been made to expand from unimodal to multimodal video summarization, categorizing the task into three sub-tasks based on the summary's modality: video-to-video (V2V), video-to-text (V2T), and a combination of video and text summarization (V2VT).

Text Summarization Video Summarization

Paper
Add Code

AVicuna: Audio-Visual LLM with Interleaver and Context-Boundary Alignment for Temporal Referential Dialogue

no code implementations • 24 Mar 2024 • Yunlong Tang, Daiki Shimada, Jing Bi, Chenliang Xu

In everyday communication, humans frequently use speech and gestures to refer to specific areas or objects, a process known as Referential Dialogue (RD).

Video Understanding

Paper
Add Code

Adaptive Super Resolution For One-Shot Talking-Head Generation

1 code implementation • 23 Mar 2024 • Luchuan Song, Pinxin Liu, Guojun Yin, Chenliang Xu

In this work, we propose an adaptive high-quality talking-head video generation method, which synthesizes high-resolution video without additional pre-trained modules.

Decoder Super-Resolution +2

136

Paper
Code

Forward Learning for Gradient-based Black-box Saliency Map Generation

no code implementations • 22 Mar 2024 • Zeliang Zhang, Mingqian Feng, Jinyang Jiang, Rongyi Zhu, Yijie Peng, Chenliang Xu

Gradient-based saliency maps are widely used to explain deep neural network decisions.

Paper
Add Code

Discover and Mitigate Multiple Biased Subgroups in Image Classifiers

1 code implementation • 19 Mar 2024 • Zeliang Zhang, Mingqian Feng, Zhiheng Li, Chenliang Xu

Discovering biased subgroups is the key to understanding models' failure modes and further improving models' robustness.

Dimensionality Reduction Subgroup Discovery

Paper
Code

Approximated Likelihood Ratio: A Forward-Only and Parallel Framework for Boosting Neural Network Training

no code implementations • 18 Mar 2024 • Zeliang Zhang, Jinyang Jiang, Zhuo Liu, Susan Liang, Yijie Peng, Chenliang Xu

In this paper, we introduce an approximation technique for the likelihood ratio (LR) method to alleviate computational and memory demands in gradient estimation.

Paper
Add Code

Efficiently Leveraging Linguistic Priors for Scene Text Spotting

no code implementations • 27 Feb 2024 • Nguyen Nguyen, Yapeng Tian, Chenliang Xu

This paper proposes a method that leverages linguistic knowledge from a large text corpus to replace the traditional one-hot encoding used in auto-regressive scene text spotting and recognition models.

Scene Text Recognition Text Detection +1

Paper
Add Code

OSCaR: Object State Captioning and State Change Representation

1 code implementation • 27 Feb 2024 • Nguyen Nguyen, Jing Bi, Ali Vosoughi, Yapeng Tian, Pooyan Fazli, Chenliang Xu

To address these challenges, in this paper, we introduce the Object State Captioning and State Change Representation (OSCaR) dataset and benchmark.

Change Detection Object

Paper
Code

Emo-Avatar: Efficient Monocular Video Style Avatar through Texture Rendering

no code implementations • 1 Feb 2024 • Pinxin Liu, Luchuan Song, Daoan Zhang, Hang Hua, Yunlong Tang, Huaijin Tu, Jiebo Luo, Chenliang Xu

To address the above problems, we propose the Efficient Monotonic Video Style Avatar (Emo-Avatar) through deferred neural rendering that enhances StyleGAN's capacity for producing dynamic, drivable portrait videos.

Contrastive Learning Neural Rendering

Paper
Add Code

Tri$^{2}$-plane: Volumetric Avatar Reconstruction with Feature Pyramid

1 code implementation • 17 Jan 2024 • Luchuan Song, Pinxin Liu, Lele Chen, Celong Liu, Chenliang Xu

Recent years have witnessed considerable achievements in facial avatar reconstruction with neural volume rendering.

Paper
Code

Bag of Tricks to Boost Adversarial Transferability

no code implementations • 16 Jan 2024 • Zeliang Zhang, Rongyi Zhu, Wei Yao, Xiaosen Wang, Chenliang Xu

In this work, we find that several tiny changes in the existing adversarial attacks can significantly affect the attack performance, \eg, the number of iterations and step size.

Paper
Add Code

Learning Audio Concepts from Counterfactual Natural Language

1 code implementation • 10 Jan 2024 • Ali Vosoughi, Luca Bondi, Ho-Hsiang Wu, Chenliang Xu

Conventional audio classification relied on predefined classes, lacking the ability to learn from free-form text.

Audio captioning Audio Classification +2

Paper
Code

Video Understanding with Large Language Models: A Survey

1 code implementation • 29 Dec 2023 • Yunlong Tang, Jing Bi, Siting Xu, Luchuan Song, Susan Liang, Teng Wang, Daoan Zhang, Jie An, Jingyang Lin, Rongyi Zhu, Ali Vosoughi, Chao Huang, Zeliang Zhang, Feng Zheng, JianGuo Zhang, Ping Luo, Jiebo Luo, Chenliang Xu

With the burgeoning growth of online video platforms and the escalating volume of video content, the demand for proficient video understanding tools has intensified markedly.

Video Understanding

701

Paper
Code

Scalable CP Decomposition for Tensor Learning using GPU Tensor Cores

no code implementations • 22 Nov 2023 • Zeliang Zhang, Zhuo Liu, Susan Liang, Zhiyuan Wang, Yifan Zhu, Chen Ding, Chenliang Xu

However, the application of tensor decomposition is largely hindered by the exponential increment of the computational complexity and storage consumption with the size of tensors.

Computational Efficiency Tensor Decomposition

Paper
Add Code

MISAR: A Multimodal Instructional System with Augmented Reality

1 code implementation • 18 Oct 2023 • Jing Bi, Nguyen Manh Nguyen, Ali Vosoughi, Chenliang Xu

Augmented reality (AR) requires the seamless integration of visual, auditory, and linguistic channels for optimized human-computer interaction.

Paper
Code

Separating Invisible Sounds Toward Universal Audiovisual Scene-Aware Sound Separation

no code implementations • 18 Oct 2023 • Yiyang Su, Ali Vosoughi, Shijian Deng, Yapeng Tian, Chenliang Xu

The audio-visual sound separation field assumes visible sources in videos, but this excludes invisible sounds beyond the camera's view.

Paper
Add Code

Emotional Listener Portrait: Neural Listener Head Generation with Emotion

no code implementations • ICCV 2023 • Luchuan Song, Guojun Yin, Zhenchao Jin, Xiaoyi Dong, Chenliang Xu

Listener head generation centers on generating non-verbal behaviors (e. g., smile) of a listener in reference to the information delivered by a speaker.

Paper
Add Code

Neural Acoustic Context Field: Rendering Realistic Room Impulse Response With Neural Fields

no code implementations • 27 Sep 2023 • Susan Liang, Chao Huang, Yapeng Tian, Anurag Kumar, Chenliang Xu

Room impulse response (RIR), which measures the sound propagation within an environment, is critical for synthesizing high-fidelity audio for a given environment.

Room Impulse Response (RIR)

Paper
Add Code

DAVIS: High-Quality Audio-Visual Separation with Generative Diffusion Models

no code implementations • 31 Jul 2023 • Chao Huang, Susan Liang, Yapeng Tian, Anurag Kumar, Chenliang Xu

We propose DAVIS, a Diffusion model-based Audio-VIusal Separation framework that solves the audio-visual sound source separation task through a generative manner.

Paper
Add Code

Unveiling Cross Modality Bias in Visual Question Answering: A Causal View with Possible Worlds VQA

no code implementations • 31 May 2023 • Ali Vosoughi, Shijian Deng, Songyang Zhang, Yapeng Tian, Chenliang Xu, Jiebo Luo

In this paper, we first model a confounding effect that causes language and vision bias simultaneously, then propose a counterfactual inference to remove the influence of this effect.

counterfactual Counterfactual Inference +2

Paper
Add Code

One Forward is Enough for Neural Network Training via Likelihood Ratio Method

no code implementations • 15 May 2023 • Jinyang Jiang, Zeliang Zhang, Chenliang Xu, Zhaofei Yu, Yijie Peng

While backpropagation (BP) is the mainstream approach for gradient computation in neural network training, its heavy reliance on the chain rule of differentiation constrains the designing flexibility of network architecture and training pipelines.

Paper
Add Code

Egocentric Audio-Visual Object Localization

1 code implementation • CVPR 2023 • Chao Huang, Yapeng Tian, Anurag Kumar, Chenliang Xu

In this paper, we explore the challenging egocentric audio-visual object localization task and observe that 1) egomotion commonly exists in first-person recordings, even within a short duration; 2) The out-of-view sound components can be created while wearers shift their attention.

Object Object Localization

Paper
Code

Improving Adversarial Transferability with Scheduled Step Size and Dual Example

no code implementations • 30 Jan 2023 • Zeliang Zhang, Peihan Liu, Xiaosen Wang, Chenliang Xu

Motivated by this finding, we argue that the information of adversarial perturbations near the benign sample, especially the direction, benefits more on the transferability.

Adversarial Attack

Paper
Add Code

A Whac-A-Mole Dilemma: Shortcuts Come in Multiples Where Mitigating One Amplifies Others

1 code implementation • CVPR 2023 • Zhiheng Li, Ivan Evtimov, Albert Gordo, Caner Hazirbas, Tal Hassner, Cristian Canton Ferrer, Chenliang Xu, Mark Ibrahim

Key to advancing the reliability of vision systems is understanding whether existing methods can overcome multiple shortcuts or struggle in a Whac-A-Mole game, i. e., where mitigating one shortcut amplifies reliance on others.

Ranked #1 on Out-of-Distribution Generalization on ImageNet-W

Domain Generalization Image Classification +1

Paper
Code

Discover and Mitigate Unknown Biases with Debiasing Alternate Networks

1 code implementation • 20 Jul 2022 • Zhiheng Li, Anthony Hoogs, Chenliang Xu

By training in an alternate manner, the discoverer tries to find multiple unknown biases of the classifier without any annotations of biases, and the classifier aims at unlearning the biases identified by the discoverer.

Ranked #1 on Out-of-Distribution Generalization on ImageNet-W

Action Recognition Facial Attribute Classification +1

Paper
Code

StyleT2I: Toward Compositional and High-Fidelity Text-to-Image Synthesis

1 code implementation • CVPR 2022 • Zhiheng Li, Martin Renqiang Min, Kai Li, Chenliang Xu

Based on the identified latent directions of attributes, we propose Compositional Attribute Adjustment to adjust the latent code, resulting in better compositionality of image synthesis.

Attribute Fairness +2

Paper
Code

Learning to Answer Questions in Dynamic Audio-Visual Scenarios

1 code implementation • CVPR 2022 • Guangyao Li, Yake Wei, Yapeng Tian, Chenliang Xu, Ji-Rong Wen, Di Hu

In this paper, we focus on the Audio-Visual Question Answering (AVQA) task, which aims to answer questions regarding different visual objects, sounds, and their associations in videos.

Ranked #5 on Audio-visual Question Answering on MUSIC-AVQA

audio-visual learning Audio-visual Question Answering +4

Paper
Code

Transformer-empowered Multi-scale Contextual Matching and Aggregation for Multi-contrast MRI Super-resolution

1 code implementation • CVPR 2022 • Guangyuan Li, Jun Lv, Yapeng Tian, Qi Dou, Chengyan Wang, Chenliang Xu, Jing Qin

However, existing methods still have two shortcomings: (1) they neglect that the multi-contrast features at different scales contain different anatomical details and hence lack effective mechanisms to match and fuse these features for better reconstruction; and (2) they are still deficient in capturing long-range dependencies, which are essential for the regions with complicated anatomical structures.

Super-Resolution

Paper
Code

Cross-modal Contrastive Distillation for Instructional Activity Anticipation

no code implementations • 18 Jan 2022 • Zhengyuan Yang, Jingen Liu, Jing Huang, Xiaodong He, Tao Mei, Chenliang Xu, Jiebo Luo

In this study, we aim to predict the plausible future action steps given an observation of the past and study the task of instructional activity anticipation.

Knowledge Distillation

Paper
Add Code

SpaceEdit: Learning a Unified Editing Space for Open-Domain Image Color Editing

no code implementations • CVPR 2022 • Jing Shi, Ning Xu, Haitian Zheng, Alex Smith, Jiebo Luo, Chenliang Xu

Recently, large pretrained models (e. g., BERT, StyleGAN, CLIP) show great knowledge transfer and generalization capability on various downstream tasks within their domains.

Decoder Image-to-Image Translation +2

Paper
Add Code

Anomaly Crossing: New Horizons for Video Anomaly Detection as Cross-domain Few-shot Learning

no code implementations • 12 Dec 2021 • Guangyu Sun, Zhang Liu, Lianggong Wen, Jing Shi, Chenliang Xu

Video anomaly detection aims to identify abnormal events that occurred in videos.

Anomaly Detection cross-domain few-shot learning +1

Paper
Add Code

SpaceEdit: Learning a Unified Editing Space for Open-Domain Image Editing

no code implementations • 30 Nov 2021 • Jing Shi, Ning Xu, Haitian Zheng, Alex Smith, Jiebo Luo, Chenliang Xu

Recently, large pretrained models (e. g., BERT, StyleGAN, CLIP) have shown great knowledge transfer and generalization capability on various downstream tasks within their domains.

Decoder Image-to-Image Translation +2

Paper
Add Code

Space-Time Memory Network for Sounding Object Localization in Videos

no code implementations • 10 Nov 2021 • Sizhe Li, Yapeng Tian, Chenliang Xu

Leveraging temporal synchronization and association within sight and sound is an essential step towards robust localization of sounding objects.

Object Localization

Paper
Add Code

Procedure Planning in Instructional Videos via Contextual Modeling and Model-based Policy Learning

no code implementations • ICCV 2021 • Jing Bi, Jiebo Luo, Chenliang Xu

In this work, we leverage instructional videos to study humans' decision-making processes, focusing on learning a model to plan goal-directed actions in real-life videos.

Action Recognition Bayesian Inference +1

Paper
Add Code

rQdia: Regularizing Q-Value Distributions With Image Augmentation

no code implementations • 29 Sep 2021 • Samuel Lerman, Jing Bi, Chenliang Xu

rQdia (pronounced “Arcadia”) regularizes Q-value distributions with augmented images in pixel-based deep reinforcement learning.

Continuous Control Image Augmentation +2

Paper
Add Code

Learning to Generate Scene Graph from Natural Language Supervision

1 code implementation • ICCV 2021 • Yiwu Zhong, Jing Shi, Jianwei Yang, Chenliang Xu, Yin Li

To bridge the gap between images and texts, we leverage an off-the-shelf object detector to identify and localize object instances, match labels of detected regions to concepts parsed from captions, and thus create "pseudo" labels for learning scene graph.

Graph Generation Scene Graph Generation +1

Paper
Code

Learning by Planning: Language-Guided Global Image Editing

1 code implementation • CVPR 2021 • Jing Shi, Ning Xu, Yihang Xu, Trung Bui, Franck Dernoncourt, Chenliang Xu

Recently, language-guided global image editing draws increasing attention with growing application potentials.

Paper
Code

Discover the Unknown Biased Attribute of an Image Classifier

1 code implementation • ICCV 2021 • Zhiheng Li, Chenliang Xu

To help human experts better find the AI algorithms' biases, we study a new problem in this work -- for a classifier that predicts a target attribute of the input image, discover its unknown biased attribute.

Attribute Disentanglement

Paper
Code

Zooming SlowMo: An Efficient One-Stage Framework for Space-Time Video Super-Resolution

1 code implementation • 15 Apr 2021 • Xiaoyu Xiang, Yapeng Tian, Yulun Zhang, Yun Fu, Jan P. Allebach, Chenliang Xu

A na\"ive method is to decompose it into two sub-tasks: video frame interpolation (VFI) and video super-resolution (VSR).

Space-time Video Super-resolution Video Frame Interpolation +1

897

Paper
Code

Can audio-visual integration strengthen robustness under multimodal attacks?

1 code implementation • CVPR 2021 • Yapeng Tian, Chenliang Xu

In this paper, we propose to make a systematic study on machines multisensory perception under attacks.

audio-visual learning Visual Localization

Paper
Code

Cyclic Co-Learning of Sounding Object Visual Grounding and Sound Separation

1 code implementation • CVPR 2021 • Yapeng Tian, Di Hu, Chenliang Xu

There are rich synchronized audio and visual events in our daily life.

Object Visual Grounding

Paper
Code

High-fidelity Face Tracking for AR/VR via Deep Lighting Adaptation

no code implementations • CVPR 2021 • Lele Chen, Chen Cao, Fernando de la Torre, Jason Saragih, Chenliang Xu, Yaser Sheikh

This paper addresses previous limitations by learning a deep learning lighting model, that in combination with a high-quality 3D face tracking algorithm, provides a method for subtle and robust facial motion transfer from a regular video to a 3D photo-realistic avatar.

Vocal Bursts Intensity Prediction

Paper
Add Code

A Simple Baseline for Weakly-Supervised Scene Graph Generation

no code implementations • ICCV 2021 • Jing Shi, Yiwu Zhong, Ning Xu, Yin Li, Chenliang Xu

We investigate the weakly-supervised scene graph generation, which is a challenging task since no correspondence of label and object is provided.

Contrastive Learning Graph Generation +2

Paper
Add Code

A Benchmark and Baseline for Language-Driven Image Editing

no code implementations • 5 Oct 2020 • Jing Shi, Ning Xu, Trung Bui, Franck Dernoncourt, Zheng Wen, Chenliang Xu

To solve this new task, we first present a new language-driven image editing dataset that supports both local and global editing with editing operation and mask annotations.

Paper
Add Code

Cubic Spline Smoothing Compensation for Irregularly Sampled Sequences

no code implementations • 3 Oct 2020 • Jing Shi, Jing Bi, Yingru Liu, Chenliang Xu

The marriage of recurrent neural networks and neural ordinary differential networks (ODE-RNN) is effective in modeling irregularly-observed sequences.

Paper
Add Code

Actor-Action Video Classification CSC 249/449 Spring 2020 Challenge Report

1 code implementation • 1 Aug 2020 • Jing Shi, Zhiheng Li, Haitian Zheng, Yihang Xu, Tianyou Xiao, Weitao Tan, Xiaoning Guo, Sizhe Li, Bin Yang, Zhexin Xu, Ruitao Lin, Zhongkai Shangguan, Yue Zhao, Jingwen Wang, Rohan Sharma, Surya Iyer, Ajinkya Deshmukh, Raunak Mahalik, Srishti Singh, Jayant G Rohra, Yi-Peng Zhang, Tongyu Yang, Xuan Wen, Ethan Fahnestock, Bryce Ikeda, Ian Lawson, Alan Finkelstein, Kehao Guo, Richard Magnotti, Andrew Sexton, Jeet Ketan Thaker, Yiyang Su, Chenliang Xu

This technical report summarizes submissions and compiles from Actor-Action video classification challenge held as a final project in CSC 249/449 Machine Vision course (Spring 2020) at University of Rochester

General Classification Video Classification

Paper
Code

Unified Multisensory Perception: Weakly-Supervised Audio-Visual Video Parsing

1 code implementation • ECCV 2020 • Yapeng Tian, DIngzeyu Li, Chenliang Xu

In this paper, we introduce a new problem, named audio-visual video parsing, which aims to parse a video into temporal event segments and label them as either audible, visible, or both.

Multiple Instance Learning

Paper
Code

Talking-head Generation with Rhythmic Head Motion

1 code implementation • 16 Jul 2020 • Lele Chen, Guofeng Cui, Celong Liu, Zhong Li, Ziyi Kou, Yi Xu, Chenliang Xu

When people deliver a speech, they naturally move heads, and this rhythmic head motion conveys prosodic information.

Talking Head Generation

196

Paper
Code

Graph Neural Network Based Coarse-Grained Mapping Prediction

2 code implementations • 24 Jun 2020 • Zhiheng Li, Geemi P. Wellawatte, Maghesree Chakraborty, Heta A. Gandhi, Chenliang Xu, Andrew D. White

The selection of coarse-grained (CG) mapping operators is a critical step for CG molecular dynamics (MD) simulation.

Clustering graph partitioning +2

Paper
Code

Explaining Local, Global, And Higher-Order Interactions In Deep Learning

1 code implementation • ICCV 2021 • Samuel Lerman, Chenliang Xu, Charles Venuto, Henry Kautz

We present a simple yet highly generalizable method for explaining interacting parts within a neural network's reasoning process.

Explainable artificial intelligence Object Detection +1

Paper
Code

What comprises a good talking-head video generation?: A Survey and Benchmark

1 code implementation • 7 May 2020 • Lele Chen, Guofeng Cui, Ziyi Kou, Haitian Zheng, Chenliang Xu

In this work, we present a carefully-designed benchmark for evaluating talking-head video generation with standardized dataset pre-processing strategies.

Talking Head Generation Video Generation

Paper
Code

Learning a Weakly-Supervised Video Actor-Action Segmentation Model with a Wise Selection

no code implementations • CVPR 2020 • Jie Chen, Zhiheng Li, Jiebo Luo, Chenliang Xu

Instead of blindly trusting quality-inconsistent PAs, WS^2 employs a learning-based selection to select effective PAs and a novel region integrity criterion as a stopping condition for weakly-supervised training.

Action Segmentation Segmentation +3

Paper
Add Code

Deep Grouping Model for Unified Perceptual Parsing

no code implementations • CVPR 2020 • Zhiheng Li, Wenxuan Bao, Jiayang Zheng, Chenliang Xu

The perceptual-based grouping process produces a hierarchical and compositional image representation that helps both human and machine vision systems recognize heterogeneous visual concepts.

Image Segmentation Segmentation +1

Paper
Add Code

Zooming Slow-Mo: Fast and Accurate One-Stage Space-Time Video Super-Resolution

3 code implementations • CVPR 2020 • Xiaoyu Xiang, Yapeng Tian, Yulun Zhang, Yun Fu, Jan P. Allebach, Chenliang Xu

Rather than synthesizing missing LR video frames as VFI networks do, we firstly temporally interpolate LR frame features in missing LR video frames capturing local temporal contexts by the proposed feature temporal interpolation network.

Ranked #4 on Video Frame Interpolation on Vid4 - 4x upscaling

Space-time Video Super-resolution Video Frame Interpolation +1

897

Paper
Code

Assembling Semantically-Disentangled Representations for Predictive-Generative Models via Adaptation from Synthetic Domain

no code implementations • 23 Feb 2020 • Burkay Donderici, Caleb New, Chenliang Xu

Deep neural networks can form high-level hierarchical representations of input data.

Paper
Add Code

TailorGAN: Making User-Defined Fashion Designs

2 code implementations • 17 Jan 2020 • Lele Chen, Justin Tian, Guo Li, Cheng-Haw Wu, Erh-Kan King, Kuan-Ting Chen, Shao-Hang Hsieh, Chenliang Xu

To overcome those limitations, we propose a novel self-supervised model to synthesize garment images with disentangled attributes (e. g., collar and sleeves) without paired data.

Attribute

154

Paper
Code

Deep Audio Prior

1 code implementation • 21 Dec 2019 • Yapeng Tian, Chenliang Xu, DIngzeyu Li

We are interested in applying deep networks in the absence of training dataset.

blind source separation Texture Synthesis

157

Paper
Code

Learning from Interventions using Hierarchical Policies for Safe Learning

no code implementations • 4 Dec 2019 • Jing Bi, Vikas Dhiman, Tianyou Xiao, Chenliang Xu

The recently proposed Learning from Interventions (LfI) overcomes this limitation by using an expert overseer.

Paper
Add Code

Improve CAM with Auto-adapted Segmentation and Co-supervised Augmentation

no code implementations • 17 Nov 2019 • Ziyi Kou, Guofeng Cui, Shaojie Wang, Wentian Zhao, Chenliang Xu

In this paper, we propose a confidence segmentation (ConfSeg) module that builds confidence score for each pixel in CAM without introducing additional hyper-parameters.

Object Weakly-Supervised Object Localization

Paper
Add Code

Unsupervised Pose Flow Learning for Pose Guided Synthesis

no code implementations • 30 Sep 2019 • Haitian Zheng, Lele Chen, Chenliang Xu, Jiebo Luo

Pose guided synthesis aims to generate a new image in an arbitrary target pose while preserving the appearance details from the source image.

Paper
Add Code

Hierarchical Cross-Modal Talking Face Generationwith Dynamic Pixel-Wise Loss

1 code implementation • 9 May 2019 • Lele Chen, Ross K. Maddox, Zhiyao Duan, Chenliang Xu

We devise a cascade GAN approach to generate talking face video, which is robust to different face shapes, view angles, facial characteristics, and noisy audio conditions.

255

Paper
Code

Dynamic Graph Modules for Modeling Object-Object Interactions in Activity Recognition

no code implementations • 13 Dec 2018 • Hao Huang, Luowei Zhou, Wei zhang, Jason J. Corso, Chenliang Xu

Video action recognition, a critical problem in video understanding, has been gaining increasing attention.

3D Action Recognition Object +1

Paper
Add Code

TDAN: Temporally Deformable Alignment Network for Video Super-Resolution

2 code implementations • 7 Dec 2018 • Yapeng Tian, Yulun Zhang, Yun Fu, Chenliang Xu

Video super-resolution (VSR) aims to restore a photo-realistic high-resolution (HR) video frame from both its corresponding low-resolution (LR) frame (reference frame) and multiple neighboring frames (supporting frames).

Ranked #12 on Video Super-Resolution on MSU Video Super Resolution Benchmark: Detail Restoration

Optical Flow Estimation Video Super-Resolution

400

Paper
Code

An Attempt towards Interpretable Audio-Visual Video Captioning

no code implementations • 7 Dec 2018 • Yapeng Tian, Chenxiao Guan, Justin Goodman, Marc Moore, Chenliang Xu

To achieve this, we propose a multimodal convolutional neural network-based audio-visual video captioning framework and introduce a modality-aware module for exploring modality selection during sentence generation.

Audio captioning Audio-Visual Video Captioning +3

Paper
Add Code

GAN-EM: GAN based EM learning framework

no code implementations • 2 Dec 2018 • Wentian Zhao, Shaojie Wang, Zhihuai Xie, Jing Shi, Chenliang Xu

To overcome such limitation, we propose a GAN based EM learning framework that can maximize the likelihood of images and estimate the latent variables with only the constraint of L-Lipschitz continuity.

Clustering Dimensionality Reduction +2

Paper
Add Code

How to Make a BLT Sandwich? Learning to Reason towards Understanding Web Instructional Videos

no code implementations • 2 Dec 2018 • Shaojie Wang, Wentian Zhao, Ziyi Kou, Chenliang Xu

Furthermore, we study multiple modalities including description and transcripts for the purpose of boosting video understanding.

Logical Reasoning Question Answering +1

Paper
Add Code

Navigation by Imitation in a Pedestrian-Rich Environment

no code implementations • 1 Nov 2018 • Jing Bi, Tianyou Xiao, Qiuyue Sun, Chenliang Xu

Deep neural networks trained on demonstrations of human actions give robot the ability to perform self-driving on the road.

Imitation Learning Navigate

Paper
Add Code

Weakly-Supervised Action Segmentation with Iterative Soft Boundary Assignment

1 code implementation • CVPR 2018 • Li Ding, Chenliang Xu

In this work, we address the task of weakly-supervised human action segmentation in long, untrimmed videos.

Action Segmentation

Paper
Code

Lip Movements Generation at a Glance

1 code implementation • ECCV 2018 • Lele Chen, Zhiheng Li, Ross K. Maddox, Zhiyao Duan, Chenliang Xu

In this paper, we consider a task of such: given an arbitrary audio speech and one lip image of arbitrary target identity, generate synthesized lip movements of the target identity saying the speech.

Paper
Code

Generating Talking Face Landmarks from Speech

no code implementations • 26 Mar 2018 • Sefik Emre Eskimez, Ross K. Maddox, Chenliang Xu, Zhiyao Duan

In this paper, we present a system that can generate landmark points of a talking face from an acoustic speech in real time.

Paper
Add Code

Audio-Visual Event Localization in Unconstrained Videos

2 code implementations • ECCV 2018 • Yapeng Tian, Jing Shi, Bochen Li, Zhiyao Duan, Chenliang Xu

In this paper, we introduce a novel problem of audio-visual event localization in unconstrained videos.

audio-visual event localization Temporal Localization

158

Paper
Code

MRI Tumor Segmentation with Densely Connected 3D CNN

3 code implementations • 18 Jan 2018 • Lele Chen, Yue Wu, Adora M. DSouza, Anas Z. Abidin, Axel Wismuller, Chenliang Xu

The major difficulty of our segmentation model comes with the fact that the location, structure, and shape of gliomas vary significantly among different patients.

Tumor Segmentation

159

Paper
Code

Video Action Segmentation with Hybrid Temporal Networks

no code implementations • ICLR 2018 • Li Ding, Chenliang Xu

Action segmentation as a milestone towards building automatic systems to understand untrimmed videos has received considerable attention in the recent years.

Action Segmentation Decoder +1

Paper
Add Code

Weakly Supervised Actor-Action Segmentation via Robust Multi-Task Ranking

no code implementations • CVPR 2017 • Yan Yan, Chenliang Xu, Dawen Cai, Jason J. Corso

However, current methods for detailed understanding of actor and action have significant limitations: they require large amounts of finely labeled data, and they fail to capture any internal relationship among actors and actions.

Action Classification Action Segmentation +2

Paper
Add Code

TricorNet: A Hybrid Temporal Convolutional and Recurrent Network for Video Action Segmentation

no code implementations • 22 May 2017 • Li Ding, Chenliang Xu

Action segmentation as a milestone towards building automatic systems to understand untrimmed videos has received considerable attention in the recent years.

Ranked #4 on Action Segmentation on JIGSAWS

Action Segmentation Decoder +1

Paper
Add Code

Action Understanding with Multiple Classes of Actors

no code implementations • 27 Apr 2017 • Chenliang Xu, Caiming Xiong, Jason J. Corso

Despite the rapid progress, existing works on action understanding focus strictly on one type of action agent, which we call actor---a human adult, ignoring the diversity of actions performed by other actors.

Action Recognition Action Segmentation +3

Paper
Add Code

Deep Cross-Modal Audio-Visual Generation

no code implementations • 26 Apr 2017 • Lele Chen, Sudhanshu Srivastava, Zhiyao Duan, Chenliang Xu

Being the first to explore this new problem, we compose two new datasets with pairs of images and sounds of musical performances of different instruments.

Paper
Add Code

Towards Automatic Learning of Procedures from Web Instructional Videos

1 code implementation • 28 Mar 2017 • Luowei Zhou, Chenliang Xu, Jason J. Corso

To answer this question, we introduce the problem of procedure segmentation--to segment a video procedure into category-independent procedure segments.

Dense Video Captioning Procedure Learning +1

Paper
Code

Watch What You Just Said: Image Captioning with Text-Conditional Attention

1 code implementation • 15 Jun 2016 • Luowei Zhou, Chenliang Xu, Parker Koch, Jason J. Corso

Attention mechanisms have attracted considerable interest in image captioning due to its powerful performance.

Image Captioning Language Modelling

Paper
Code

Actor-Action Semantic Segmentation with Grouping Process Models

no code implementations • CVPR 2016 • Chenliang Xu, Jason J. Corso

Actor-action semantic segmentation made an important step toward advanced video understanding problems: what action is happening; who is performing the action; and where is the action in space-time.

Semantic Segmentation Video Understanding

Paper
Add Code

LIBSVX: A Supervoxel Library and Benchmark for Early Video Processing

no code implementations • 30 Dec 2015 • Chenliang Xu, Jason J. Corso

Supervoxel segmentation has strong potential to be incorporated into early video analysis as superpixel segmentation has in image analysis.

Boundary Detection Segmentation +1

Paper
Add Code

Can Humans Fly? Action Understanding With Multiple Classes of Actors

no code implementations • CVPR 2015 • Chenliang Xu, Shao-Hang Hsieh, Caiming Xiong, Jason J. Corso

There is no work we know of on simultaneously inferring actors and actions in the video, not to mention a dataset to experiment with.

Action Recognition Action Understanding +2

Paper
Add Code

A Study of Actor and Action Semantic Retention in Video Supervoxel Segmentation

no code implementations • 13 Nov 2013 • Chenliang Xu, Richard F. Doell, Stephen José Hanson, Catherine Hanson, Jason J. Corso

In this paper, we conduct a systematic study of how well the actor and action semantics are retained in video supervoxel segmentation.

object-detection Object Detection +1

Paper
Add Code

A Thousand Frames in Just a Few Words: Lingual Description of Videos through Latent Topics and Sparse Object Stitching

no code implementations • CVPR 2013 • Pradipto Das, Chenliang Xu, Richard F. Doell, Jason J. Corso

The problem of describing images through natural language has gained importance in the computer vision community.

Video Description

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.