Search Results for author: Zhuowen Tu

Found 84 papers, 38 papers with code

Grounded Compositional and Diverse Text-to-3D with Pretrained Multi-View Diffusion Model

no code implementations • 28 Apr 2024 • Xiaolong Li, Jiawei Mo, Ying Wang, Chethan Parameshwara, Xiaohan Fei, Ashwin Swaminathan, Cj Taylor, Zhuowen Tu, Paolo Favaro, Stefano Soatto

In this paper, we propose an effective two-stage approach named Grounded-Dreamer to generate 3D assets that can accurately follow complex, compositional text prompts while achieving high fidelity by using a pre-trained multi-view diffusion model.

Image Generation Text to 3D

Paper
Add Code

On the Scalability of Diffusion-based Text-to-Image Generation

no code implementations • 3 Apr 2024 • Hao Li, Yang Zou, Ying Wang, Orchid Majumder, Yusheng Xie, R. Manmatha, Ashwin Swaminathan, Zhuowen Tu, Stefano Ermon, Stefano Soatto

On the data scaling side, we show the quality and diversity of the training set matters more than simply dataset size.

Denoising Text-to-Image Generation

Paper
Add Code

HOIDiffusion: Generating Realistic 3D Hand-Object Interaction Data

no code implementations • 18 Mar 2024 • Mengqi Zhang, Yang Fu, Zheng Ding, Sifei Liu, Zhuowen Tu, Xiaolong Wang

In this paper, we propose HOIDiffusion for generating realistic and diverse 3D hand-object interaction data.

6D Pose Estimation using RGB Image Generation +1

Paper
Add Code

Bayesian Diffusion Models for 3D Shape Reconstruction

no code implementations • 11 Mar 2024 • Haiyang Xu, Yu Lei, Zeyuan Chen, Xiang Zhang, Yue Zhao, Yilin Wang, Zhuowen Tu

We present Bayesian Diffusion Models (BDM), a prediction algorithm that performs effective Bayesian inference by tightly coupling the top-down (prior) information with the bottom-up (data-driven) procedure via joint diffusion processes.

3D Reconstruction 3D Shape Reconstruction +1

Paper
Add Code

Enhancing Vision-Language Pre-training with Rich Supervisions

no code implementations • 5 Mar 2024 • Yuan Gao, Kunyu Shi, Pengkai Zhu, Edouard Belval, Oren Nuriel, Srikar Appalaraju, Shabnam Ghadar, Vijay Mahadevan, Zhuowen Tu, Stefano Soatto

We propose Strongly Supervised pre-training with ScreenShots (S4) - a novel pre-training paradigm for Vision-Language Models using data from large-scale web screenshot rendering.

Table Detection

Paper
Add Code

Non-autoregressive Sequence-to-Sequence Vision-Language Models

no code implementations • 4 Mar 2024 • Kunyu Shi, Qi Dong, Luis Goncalves, Zhuowen Tu, Stefano Soatto

Sequence-to-sequence vision-language models are showing promise, but their applicability is limited by their inference latency due to their autoregressive way of generating predictions.

Decoder Language Modelling

Paper
Add Code

AffordanceLLM: Grounding Affordance from Vision Language Models

no code implementations • 12 Jan 2024 • Shengyi Qian, Weifeng Chen, Min Bai, Xiong Zhou, Zhuowen Tu, Li Erran Li

Affordance grounding refers to the task of finding the area of an object with which one can interact.

Human-Object Interaction Detection Object

Paper
Add Code

Restoration by Generation with Constrained Priors

no code implementations • 28 Dec 2023 • Zheng Ding, Xuaner Zhang, Zhuowen Tu, Zhihao Xia

We propose a method to adapt a pretrained diffusion model for image restoration by simply adding noise to the input image to be restored and then denoise.

Denoising Image Restoration

Paper
Add Code

TokenCompose: Grounding Diffusion with Token-level Supervision

1 code implementation • 6 Dec 2023 • ZiRui Wang, Zhizhou Sha, Zheng Ding, Yilin Wang, Zhuowen Tu

We present TokenCompose, a Latent Diffusion Model for text-to-image generation that achieves enhanced consistency between user-specified text prompts and model-generated images.

Denoising Object +1

Paper
Code

When Is Multilinguality a Curse? Language Modeling for 250 High- and Low-Resource Languages

1 code implementation • 15 Nov 2023 • Tyler A. Chang, Catherine Arnett, Zhuowen Tu, Benjamin K. Bergen

However, concrete evidence for the effects of multilinguality on language modeling performance in individual languages remains scarce.

Language Modelling

Paper
Code

Dolfin: Diffusion Layout Transformers without Autoencoder

no code implementations • 25 Oct 2023 • Yilin Wang, Zeyuan Chen, Liangjun Zhong, Zheng Ding, Zhizhou Sha, Zhuowen Tu

In this paper, we introduce a novel generative model, Diffusion Layout Transformers without Autoencoder (Dolfin), which significantly improves the modeling capability with reduced complexity compared to existing methods.

Paper
Add Code

SkeleTR: Towrads Skeleton-based Action Recognition in the Wild

no code implementations • 20 Sep 2023 • Haodong Duan, Mingze Xu, Bing Shuai, Davide Modolo, Zhuowen Tu, Joseph Tighe, Alessandro Bergamo

It first models the intra-person skeleton dynamics for each skeleton sequence with graph convolutions, and then uses stacked Transformer encoders to capture person interactions that are important for action recognition in general scenarios.

Action Classification Action Recognition +2

Paper
Add Code

Object-Centric Multiple Object Tracking

1 code implementation • ICCV 2023 • Zixu Zhao, Jiaze Wang, Max Horn, Yizhuo Ding, Tong He, Zechen Bai, Dominik Zietlow, Carl-Johann Simon-Gabriel, Bing Shuai, Zhuowen Tu, Thomas Brox, Bernt Schiele, Yanwei Fu, Francesco Locatello, Zheng Zhang, Tianjun Xiao

Unsupervised object-centric learning methods allow the partitioning of scenes into entities without additional localization information and are excellent candidates for reducing the annotation burden of multiple-object tracking (MOT) pipelines.

Multiple Object Tracking Object +3

Paper
Code

Characterizing Learning Curves During Language Model Pre-Training: Learning, Forgetting, and Stability

1 code implementation • 29 Aug 2023 • Tyler A. Chang, Zhuowen Tu, Benjamin K. Bergen

We quantify the final surprisal, within-run variability, age of acquisition, forgettability, and cross-run variability of learning curves for individual tokens in context.

Language Modelling

Paper
Code

BLIVA: A Simple Multimodal LLM for Better Handling of Text-Rich Visual Questions

1 code implementation • 19 Aug 2023 • WenBo Hu, Yifan Xu, Yi Li, Weiyue Li, Zeyuan Chen, Zhuowen Tu

BLIVA demonstrates significant capability in decoding real-world images, irrespective of text presence.

Optical Character Recognition (OCR) Question Answering +1

230

Paper
Code

Patched Denoising Diffusion Models For High-Resolution Image Synthesis

1 code implementation • 2 Aug 2023 • Zheng Ding, Mengqi Zhang, Jiajun Wu, Zhuowen Tu

Feature collage systematically crops and combines partial features of the neighboring patches to predict the features of a shifted image patch, allowing the seamless generation of the entire image due to the overlap in the patch feature space.

Denoising Image Generation

Paper
Code

DocTr: Document Transformer for Structured Information Extraction in Documents

no code implementations • ICCV 2023 • Haofu Liao, Aruni RoyChowdhury, Weijian Li, Ankan Bansal, Yuting Zhang, Zhuowen Tu, Ravi Kumar Satzoda, R. Manmatha, Vijay Mahadevan

We present a new formulation for structured information extraction (SIE) from visually rich documents.

Ranked #2 on Entity Linking on FUNSD

Entity Linking Semantic entity labeling

Paper
Add Code

Distilling Large Vision-Language Model with Out-of-Distribution Generalizability

1 code implementation • ICCV 2023 • Xuanlin Li, Yunhao Fang, Minghua Liu, Zhan Ling, Zhuowen Tu, Hao Su

Model distillation, the process of creating smaller, faster models that maintain the performance of larger models, is a promising direction towards the solution.

Few-Shot Image Classification Knowledge Distillation +7

Paper
Code

Musketeer: Joint Training for Multi-task Vision Language Model with Task Explanation Prompts

1 code implementation • 11 May 2023 • Zhaoyang Zhang, Yantao Shen, Kunyu Shi, Zhaowei Cai, Jun Fang, Siqi Deng, Hao Yang, Davide Modolo, Zhuowen Tu, Stefano Soatto

We present a vision-language model whose parameters are jointly trained on all tasks and fully shared among multiple heterogeneous tasks which may interfere with each other, resulting in a single model which we named Musketeer.

Language Modelling

Paper
Code

DiffusionRig: Learning Personalized Priors for Facial Appearance Editing

1 code implementation • CVPR 2023 • Zheng Ding, Xuaner Zhang, Zhihao Xia, Lars Jebe, Zhuowen Tu, Xiuming Zhang

On a high level, DiffusionRig learns to map simplistic renderings of 3D face models to realistic photos of a given person.

243

Paper
Code

Single-Stage Diffusion NeRF: A Unified Approach to 3D Generation and Reconstruction

1 code implementation • ICCV 2023 • Hansheng Chen, Jiatao Gu, Anpei Chen, Wei Tian, Zhuowen Tu, Lingjie Liu, Hao Su

3D-aware image synthesis encompasses a variety of tasks, such as scene generation and novel view synthesis from images.

3D-Aware Image Synthesis 3D Generation +4

416

Paper
Code

Guided Recommendation for Model Fine-Tuning

no code implementations • CVPR 2023 • Hao Li, Charless Fowlkes, Hao Yang, Onkar Dabeer, Zhuowen Tu, Stefano Soatto

With thousands of historical training jobs, a recommendation system can be learned to predict the model selection score given the features of the dataset and the model as input.

Model Selection Transfer Learning

Paper
Add Code

Uni-3D: A Universal Model for Panoptic 3D Scene Reconstruction

1 code implementation • ICCV 2023 • Xiang Zhang, Zeyuan Chen, Fangyin Wei, Zhuowen Tu

Performing holistic 3D scene understanding from a single-view observation, involving generating instance shapes and 3D scene segmentation, is a long-standing challenge.

3D Scene Reconstruction Image Segmentation +4

Paper
Code

SkeleTR: Towards Skeleton-based Action Recognition in the Wild

no code implementations • ICCV 2023 • Haodong Duan, Mingze Xu, Bing Shuai, Davide Modolo, Zhuowen Tu, Joseph Tighe, Alessandro Bergamo

Ranked #2 on Human Interaction Recognition on NTU RGB+D

Action Classification Action Recognition +3

Paper
Add Code

MasQCLIP for Open-Vocabulary Universal Image Segmentation

1 code implementation • ICCV 2023 • Xin Xu, Tianyi Xiong, Zheng Ding, Zhuowen Tu

We present a new method for open-vocabulary universal image segmentation, which is capable of performing instance, semantic, and panoptic segmentation under a unified framework.

Image Segmentation Panoptic Segmentation +1

Paper
Code

On the Feasibility of Cross-Task Transfer with Model-Based Reinforcement Learning

1 code implementation • 19 Oct 2022 • Yifan Xu, Nicklas Hansen, ZiRui Wang, Yung-Chieh Chan, Hao Su, Zhuowen Tu

Reinforcement Learning (RL) algorithms can solve challenging control problems directly from image observations, but they often require millions of environment interactions to do so.

Atari Games 100k Model-based Reinforcement Learning +2

Paper
Code

Point Cloud Recognition with Position-to-Structure Attention Transformers

no code implementations • 5 Oct 2022 • Zheng Ding, James Hou, Zhuowen Tu

In this paper, we present Position-to-Structure Attention Transformers (PS-Former), a Transformer-based algorithm for 3D point cloud recognition.

Feature Engineering Position +2

Paper
Add Code

An In-depth Study of Stochastic Backpropagation

1 code implementation • 30 Sep 2022 • Jun Fang, Mingze Xu, Hao Chen, Bing Shuai, Zhuowen Tu, Joseph Tighe

In this paper, we provide an in-depth study of Stochastic Backpropagation (SBP) when training deep neural networks for standard image classification and object detection tasks.

Image Classification object-detection +1

Paper
Code

Open-Vocabulary Universal Image Segmentation with MaskCLIP

1 code implementation • 18 Aug 2022 • Zheng Ding, Jieke Wang, Zhuowen Tu

In this paper, we tackle an emerging computer vision task, open-vocabulary universal image segmentation, that aims to perform semantic/instance/panoptic segmentation (background semantic labeling + foreground instance segmentation) for arbitrary categories of text-based descriptions in inference time.

Ranked #10 on Open Vocabulary Semantic Segmentation on PASCAL Context-459

Image Segmentation Instance Segmentation +4

Paper
Code

Semi-supervised Vision Transformers at Scale

1 code implementation • 11 Aug 2022 • Zhaowei Cai, Avinash Ravichandran, Paolo Favaro, Manchen Wang, Davide Modolo, Rahul Bhotika, Zhuowen Tu, Stefano Soatto

We study semi-supervised learning (SSL) for vision transformers (ViT), an under-explored topic despite the wide adoption of the ViT architectures to different tasks.

Ranked #3 on Semi-Supervised Image Classification on ImageNet - 10% labeled data

Inductive Bias Semi-Supervised Image Classification

Paper
Code

The Geometry of Multilingual Language Model Representations

1 code implementation • 22 May 2022 • Tyler A. Chang, Zhuowen Tu, Benjamin K. Bergen

The subspace means differ along language-sensitive axes that are relatively stable throughout middle layers, and these axes encode information such as token vocabularies.

Cross-Lingual Transfer Transfer Learning +1

Paper
Code

ELODI: Ensemble Logit Difference Inhibition for Positive-Congruent Training

1 code implementation • 12 May 2022 • Yue Zhao, Yantao Shen, Yuanjun Xiong, Shuo Yang, Wei Xia, Zhuowen Tu, Bernt Schiele, Stefano Soatto

Based on the observation, we present a method, called Ensemble Logit Difference Inhibition (ELODI), to train a classification system that achieves paragon performance in both error rate and NFR, at the inference cost of a single model.

Classification Image Classification

Paper
Code

X-DETR: A Versatile Architecture for Instance-wise Vision-Language Tasks

no code implementations • 12 Apr 2022 • Zhaowei Cai, Gukyeong Kwon, Avinash Ravichandran, Erhan Bas, Zhuowen Tu, Rahul Bhotika, Stefano Soatto

In this paper, we study the challenging instance-wise vision-language tasks, where the free-form language is required to align with the objects instead of the whole image.

Paper
Add Code

Text Spotting Transformers

1 code implementation • CVPR 2022 • Xiang Zhang, Yongwen Su, Subarna Tripathi, Zhuowen Tu

In this paper, we present TExt Spotting TRansformers (TESTR), a generic end-to-end text spotting framework using Transformers for text detection and recognition in the wild.

Ranked #6 on Text Spotting on ICDAR 2015

Text Detection Text Spotting

173

Paper
Code

MeMOT: Multi-Object Tracking with Memory

no code implementations • CVPR 2022 • Jiarui Cai, Mingze Xu, Wei Li, Yuanjun Xiong, Wei Xia, Zhuowen Tu, Stefano Soatto

We propose an online tracking algorithm that performs the object detection and data association under a common framework, capable of linking objects after a long time span.

Multi-Object Tracking Object +2

Paper
Add Code

Contrastive Neighborhood Alignment

no code implementations • 6 Jan 2022 • Pengkai Zhu, Zhaowei Cai, Yuanjun Xiong, Zhuowen Tu, Luis Goncalves, Vijay Mahadevan, Stefano Soatto

We present Contrastive Neighborhood Alignment (CNA), a manifold learning approach to maintain the topology of learned features whereby data points that are mapped to nearby representations by the source (teacher) model are also mapped to neighbors by the target (student) model.

Paper
Add Code

Instance Segmentation With Mask-Supervised Polygonal Boundary Transformers

no code implementations • CVPR 2022 • Justin Lazarow, Weijian Xu, Zhuowen Tu

In this paper, we present an end-to-end instance segmentation method that regresses a polygonal boundary for each object instance.

Instance Segmentation Segmentation +1

Paper
Add Code

Towards Panoptic 3D Parsing for Single Image in the Wild

no code implementations • 4 Nov 2021 • Sainan Liu, Vincent Nguyen, Yuan Gao, Subarna Tripathi, Zhuowen Tu

Our proposed panoptic 3D parsing framework points to a promising direction in computer vision.

3D Reconstruction 3D Shape Reconstruction +8

Paper
Add Code

ViTGAN: Training GANs with Vision Transformers

3 code implementations • ICLR 2022 • Kwonjoon Lee, Huiwen Chang, Lu Jiang, Han Zhang, Zhuowen Tu, Ce Liu

Recently, Vision Transformers (ViTs) have shown competitive performance on image recognition while requiring less vision-specific inductive biases.

Ranked #68 on Image Generation on CIFAR-10

Image Generation

506

Paper
Code

Long Short-Term Transformer for Online Action Detection

2 code implementations • NeurIPS 2021 • Mingze Xu, Yuanjun Xiong, Hao Chen, Xinyu Li, Wei Xia, Zhuowen Tu, Stefano Soatto

We present Long Short-term TRansformer (LSTR), a temporal modeling algorithm for online action detection, which employs a long- and short-term memory mechanism to model prolonged sequence data.

Ranked #3 on Online Action Detection on TVSeries

Decoder Online Action Detection +1

121

Paper
Code

Convolutions and Self-Attention: Re-interpreting Relative Positions in Pre-trained Language Models

1 code implementation • ACL 2021 • Tyler A. Chang, Yifan Xu, Weijian Xu, Zhuowen Tu

In this paper, we detail the relationship between convolutions and self-attention in natural language tasks.

Language Modelling Position

Paper
Code

Compatibility-aware Heterogeneous Visual Search

no code implementations • CVPR 2021 • Rahul Duggal, Hao Zhou, Shuo Yang, Yuanjun Xiong, Wei Xia, Zhuowen Tu, Stefano Soatto

Existing systems use the same embedding model to compute representations (embeddings) for the query and gallery images.

Neural Architecture Search Retrieval

Paper
Add Code

Visual Relationship Detection Using Part-and-Sum Transformers with Composite Queries

no code implementations • ICCV 2021 • Qi Dong, Zhuowen Tu, Haofu Liao, Yuting Zhang, Vijay Mahadevan, Stefano Soatto

Computer vision applications such as visual relationship detection and human object interaction can be formulated as a composite (structured) set detection problem in which both the parts (subject, object, and predicate) and the sum (triplet as a whole) are to be detected in a hierarchical fashion.

Human-Object Interaction Detection Object +2

Paper
Add Code

Pose Recognition with Cascade Transformers

2 code implementations • CVPR 2021 • Ke Li, Shijie Wang, Xiang Zhang, Yifan Xu, Weijian Xu, Zhuowen Tu

Here we utilize the encoder-decoder structure in Transformers to perform regression-based person and keypoint detection that is general-purpose and requires less heuristic design compared with the existing approaches.

Decoder Keypoint Detection +1

142

Paper
Code

Co-Scale Conv-Attentional Image Transformers

9 code implementations • ICCV 2021 • Weijian Xu, Yifan Xu, Tyler Chang, Zhuowen Tu

In this paper, we present Co-scale conv-attentional image Transformers (CoaT), a Transformer-based image classifier equipped with co-scale and conv-attentional mechanisms.

Instance Segmentation object-detection +2

29,908

Paper
Code

Exponential Moving Average Normalization for Self-supervised and Semi-supervised Learning

1 code implementation • CVPR 2021 • Zhaowei Cai, Avinash Ravichandran, Subhransu Maji, Charless Fowlkes, Zhuowen Tu, Stefano Soatto

We present a plug-in replacement for batch normalization (BN) called exponential moving average normalization (EMAN), which improves the performance of existing student-teacher based self- and semi-supervised learning techniques.

Ranked #2 on Semi-Supervised Image Classification on ImageNet - 0.2% labeled data

Self-Supervised Learning Semi-Supervised Image Classification

Paper
Code

Line Segment Detection Using Transformers without Edges

2 code implementations • CVPR 2021 • Yifan Xu, Weijian Xu, David Cheung, Zhuowen Tu

In this paper, we present a joint end-to-end line segment detection algorithm using Transformers that is post-processing and heuristics-guided intermediate processing (edge/junction/region detection) free.

Ranked #1 on Line Segment Detection on York Urban Dataset (FH metric)

Decoder Line Segment Detection +1

193

Paper
Code

Constellation Nets for Few-Shot Learning

1 code implementation • ICLR 2021 • Weijian Xu, Yifan Xu, Huaijin Wang, Zhuowen Tu

The success of deep convolutional neural networks builds on top of the learning of effective convolution operations, capturing a hierarchy of structured features via filtering, activation, and pooling.

Ranked #15 on Few-Shot Image Classification on FC100 5-way (5-shot)

Clustering Few-Shot Image Classification +1

Paper
Code

Dual Contradistinctive Generative Autoencoder

no code implementations • CVPR 2021 • Gaurav Parmar, Dacheng Li, Kwonjoon Lee, Zhuowen Tu

Our model, named dual contradistinctive generative autoencoder (DC-VAE), integrates an instance-level discriminative loss (maintaining the instance-level fidelity for the reconstruction/synthesis) with a set-level adversarial loss (encouraging the set-level fidelity for there construction/synthesis), both being contradistinctive.

Ranked #2 on Image Generation on LSUN Bedroom 128 x 128

Image Generation Image Reconstruction +1

Paper
Add Code

One-pixel Signature: Characterizing CNN Models for Backdoor Detection

no code implementations • ECCV 2020 • Shanjiaoyang Huang, Weiqi Peng, Zhiwei Jia, Zhuowen Tu

One-pixel signature is a general representation that can be used to characterize CNN models beyond backdoor detection.

Paper
Add Code

Guided Variational Autoencoder for Disentanglement Learning

no code implementations • CVPR 2020 • Zheng Ding, Yifan Xu, Weijian Xu, Gaurav Parmar, Yang Yang, Max Welling, Zhuowen Tu

We propose an algorithm, guided variational autoencoder (Guided-VAE), that is able to learn a controllable generative model by performing latent representation disentanglement learning.

Decoder Disentanglement +2

Paper
Add Code

Rethinking Exposure Bias In Language Modeling

no code implementations • 13 Oct 2019 • Yifan Xu, Kening Zhang, Haoyu Dong, Yuezhou Sun, Wenlong Zhao, Zhuowen Tu

Exposure bias describes the phenomenon that a language model trained under the teacher forcing schema may perform poorly at the inference stage when its predictions are conditioned on its previous predictions unseen from the training corpus.

Language Modelling Reinforcement Learning (RL)

Paper
Add Code

Neural Program Synthesis By Self-Learning

no code implementations • 13 Oct 2019 • Yifan Xu, Lu Dai, Udaikaran Singh, Kening Zhang, Zhuowen Tu

Neural inductive program synthesis is a task generating instructions that can produce desired outputs from given inputs.

Program Synthesis Reinforcement Learning (RL) +1

Paper
Add Code

Unaligned Image-to-Sequence Transformation with Loop Consistency

no code implementations • ICLR 2020 • Siyang Wang, Justin Lazarow, Kwonjoon Lee, Zhuowen Tu

We tackle the problem of modeling sequential visual phenomena.

Paper
Add Code

Learning Instance Occlusion for Panoptic Segmentation

1 code implementation • CVPR 2020 • Justin Lazarow, Kwonjoon Lee, Kunyu Shi, Zhuowen Tu

Panoptic segmentation requires segments of both "things" (countable object instances) and "stuff" (uncountable and amorphous regions) within a single output.

Ranked #22 on Panoptic Segmentation on COCO test-dev

Instance Segmentation Panoptic Segmentation +2

Paper
Code

Local Binary Pattern Networks for Character Recognition

no code implementations • ICLR 2019 • Jeng-Hau Lin, Yunfan Yang, Rajesh K. Gupta, Zhuowen Tu

Memory and computation efficient deep learning architectures are crucial to the continued proliferation of machine learning capabilities to new platforms and systems.

Binarization

Paper
Add Code

Attentional ShapeContextNet for Point Cloud Recognition

no code implementations • CVPR 2018 • Saining Xie, Sainan Liu, Zeyu Chen, Zhuowen Tu

We tackle the problem of point cloud recognition.

Point Cloud Classification

Paper
Add Code

Local Binary Pattern Networks

no code implementations • 19 Mar 2018 • Jeng-Hau Lin, Yunfan Yang, Rajesh Gupta, Zhuowen Tu

In this paper, we tackle the problem us- ing a strategy different from the existing literature by proposing local binary pattern networks or LBPNet, that is able to learn and perform binary operations in an end-to-end fashion.

Binarization

Paper
Add Code

Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification

1 code implementation • ECCV 2018 • Saining Xie, Chen Sun, Jonathan Huang, Zhuowen Tu, Kevin Murphy

Despite the steady progress in video analysis led by the adoption of convolutional neural networks (CNNs), the relative improvement has been less drastic as that in 2D static image classification.

Ranked #27 on Action Recognition on UCF101 (using extra training data)

Action Classification Action Detection +6

124

Paper
Code

Controllable Top-down Feature Transformer

no code implementations • 6 Dec 2017 • Zhiwei Jia, Haoshen Hong, Siyang Wang, Kwonjoon Lee, Zhuowen Tu

We study the intrinsic transformation of feature maps across convolutional network layers with explicit top-down control.

Data Augmentation Style Transfer

Paper
Add Code

DeepRadiologyNet: Radiologist Level Pathology Detection in CT Head Images

no code implementations • 26 Nov 2017 • Jameson Merkow, Robert Lufkin, Kim Nguyen, Stefano Soatto, Zhuowen Tu, Andrea Vedaldi

Thus, DeepRadiologyNet enables significant reduction in the workload of human radiologists by automatically filtering studies and reporting on the high-confidence ones at an operating point well below the literal error rate for US Board Certified radiologists, estimated at 0. 82%.

Paper
Add Code

Wasserstein Introspective Neural Networks

1 code implementation • CVPR 2018 • Kwonjoon Lee, Weijian Xu, Fan Fan, Zhuowen Tu

We present Wasserstein introspective neural networks (WINN) that are both a generator and a discriminator within a single model.

General Classification

Paper
Code

Introspective Neural Networks for Generative Modeling

no code implementations • ICCV 2017 • Justin Lazarow, Long Jin, Zhuowen Tu

We study unsupervised learning by developing a generative model built from progressively learned deep convolutional neural networks.

General Classification

Paper
Add Code

Binarized Convolutional Neural Networks with Separable Filters for Efficient Hardware Acceleration

no code implementations • 15 Jul 2017 • Jeng-Hau Lin, Tianwei Xing, Ritchie Zhao, Zhiru Zhang, Mani Srivastava, Zhuowen Tu, Rajesh K. Gupta

State-of-the-art convolutional neural networks are enormously costly in both compute and memory, demanding massively parallel GPUs for execution.

Paper
Add Code

Introspective Generative Modeling: Decide Discriminatively

no code implementations • 25 Apr 2017 • Justin Lazarow, Long Jin, Zhuowen Tu

We study unsupervised learning by developing introspective generative modeling (IGM) that attains a generator using progressively learned deep convolutional neural networks.

General Classification

Paper
Add Code

Introspective Classification with Convolutional Nets

no code implementations • NeurIPS 2017 • Long Jin, Justin Lazarow, Zhuowen Tu

We propose introspective convolutional networks (ICN) that emphasize the importance of having convolutional neural networks empowered with generative capabilities.

Classification General Classification

Paper
Add Code

Object Detection Free Instance Segmentation With Labeling Transformations

no code implementations • 28 Nov 2016 • Long Jin, Zeyu Chen, Zhuowen Tu

Instance segmentation has attracted recent attention in computer vision and existing methods in this domain mostly have an object detection stage.

Instance Segmentation Object +4

Paper
Add Code

Deep Convolutional Neural Networks with Merge-and-Run Mappings

4 code implementations • 23 Nov 2016 • Liming Zhao, Jingdong Wang, Xi Li, Zhuowen Tu, Wen-Jun Zeng

A deep residual network, built by stacking a sequence of residual blocks, is easy to train, because identity mappings skip residual branches and thus improve information flow.

228

Paper
Code

Aggregated Residual Transformations for Deep Neural Networks

58 code implementations • CVPR 2017 • Saining Xie, Ross Girshick, Piotr Dollár, Zhuowen Tu, Kaiming He

Our simple design results in a homogeneous, multi-branch architecture that has only a few hyper-parameters to set.

Ranked #3 on Image Classification on GasHisSDB

Domain Generalization General Classification +1

15,492

Paper
Code

Deeply supervised salient object detection with short connections

4 code implementations • CVPR 2017 • Qibin Hou, Ming-Ming Cheng, Xiao-Wei Hu, Ali Borji, Zhuowen Tu, Philip Torr

Recent progress on saliency detection is substantial, benefiting mostly from the explosive development of Convolutional Neural Networks (CNNs).

Ranked #4 on RGB Salient Object Detection on SBU

Boundary Detection Object +5

238

Paper
Code

Deep FisherNet for Object Classification

no code implementations • 31 Jul 2016 • Peng Tang, Xinggang Wang, Baoguang Shi, Xiang Bai, Wenyu Liu, Zhuowen Tu

Our proposed FisherNet combines convolutional neural network training and Fisher Vector encoding in a single end-to-end structure.

Classification Computational Efficiency +3

Paper
Add Code

Dense Volume-to-Volume Vascular Boundary Detection

1 code implementation • 26 May 2016 • Jameson Merkow, David Kriegman, Alison Marsden, Zhuowen Tu

In this work, we present a novel 3D-Convolutional Neural Network (CNN) architecture called I2I-3D that predicts boundary location in volumetric data.

Boundary Detection

Paper
Code

What Happened to My Dog in That Network: Unraveling Top-down Generators in Convolutional Neural Networks

no code implementations • 23 Nov 2015 • Patrick W. Gallagher, Shuai Tang, Zhuowen Tu

Top-down information plays a central role in human perception, but plays relatively little role in many current state-of-the-art deep networks, such as Convolutional Neural Networks (CNNs).

Data Augmentation Zero-Shot Learning

Paper
Add Code

Top-Down Learning for Structured Labeling with Convolutional Pseudoprior

no code implementations • 23 Nov 2015 • Saining Xie, Xun Huang, Zhuowen Tu

Current practice in convolutional neural networks (CNN) remains largely bottom-up and the role of top-down process in CNN for pattern analysis and visual inference is not very clear.

Paper
Add Code

Generalizing Pooling Functions in Convolutional Neural Networks: Mixed, Gated, and Tree

2 code implementations • 30 Sep 2015 • Chen-Yu Lee, Patrick W. Gallagher, Zhuowen Tu

We seek to improve deep neural networks by generalizing the pooling operations that play a central role in current architectures.

Ranked #18 on Image Classification on MNIST

Image Classification

538

Paper
Code

Training Deeper Convolutional Networks with Deep Supervision

1 code implementation • 11 May 2015 • Liwei Wang, Chen-Yu Lee, Zhuowen Tu, Svetlana Lazebnik

One of the most promising ways of improving the performance of deep convolutional neural networks is by increasing the number of convolutional layers.

General Classification

Paper
Code

Holistically-Nested Edge Detection

17 code implementations • ICCV 2015 • Saining Xie, Zhuowen Tu

We develop a new edge detection algorithm that tackles two important issues in this long-standing vision problem: (1) holistic image training and prediction; and (2) multi-scale and multi-level feature learning.

Boundary Detection Edge Detection

6,298

Paper
Code

Deeply-Supervised Nets

1 code implementation • 18 Sep 2014 • Chen-Yu Lee, Saining Xie, Patrick Gallagher, Zhengyou Zhang, Zhuowen Tu

Our proposed deeply-supervised nets (DSN) method simultaneously minimizes classification error while making the learning process of hidden layers direct and transparent.

Ranked #24 on Image Classification on MNIST

Classification General Classification +1

1,807

Paper
Code

MILCut: A Sweeping Line Multiple Instance Learning Paradigm for Interactive Image Segmentation

no code implementations • CVPR 2014 • Jiajun Wu, Yibiao Zhao, Jun-Yan Zhu, Siwei Luo, Zhuowen Tu

Interactive segmentation, in which a user provides a bounding box to an object of interest for image segmentation, has been applied to a variety of applications in image editing, crowdsourcing, computer vision, and medical imaging.

Image Segmentation Interactive Segmentation +4

Paper
Add Code

Layered Logic Classifiers: Exploring the `And' and `Or' Relations

no code implementations • 27 May 2014 • Zhuowen Tu, Piotr Dollar, Ying-Nian Wu

Many the solutions to the problem require to perform logic operations such as `and', `or', and `not'.

Pedestrian Detection Semantic Segmentation

Paper
Add Code

Scalable $k$-NN graph construction

no code implementations • 30 Jul 2013 • Jingdong Wang, Jing Wang, Gang Zeng, Zhuowen Tu, Rui Gan, Shipeng Li

The $k$-NN graph has played a central role in increasingly popular data-driven techniques for various learning and vision tasks; yet, finding an efficient and effective way to construct $k$-NN graphs remains a challenge, especially for large-scale high-dimensional data.

graph construction

Paper
Add Code

Sparse Subspace Denoising for Image Manifolds

no code implementations • CVPR 2013 • Bo Wang, Zhuowen Tu

With the increasing availability of high dimensional data and demand in sophisticated data analysis algorithms, manifold learning becomes a critical technique to perform dimensionality reduction, unraveling the intrinsic data structure.

Denoising Dimensionality Reduction

Paper
Add Code

Robust Estimation of Nonrigid Transformation for Point Set Registration

no code implementations • CVPR 2013 • Jiayi Ma, Ji Zhao, Jinwen Tian, Zhuowen Tu, Alan L. Yuille

In the second step, we estimate the transformation using a robust estimator called L 2 E. This is the main novelty of our approach and it enables us to deal with the noise and outliers which arise in the correspondence step.

Paper
Add Code

Harvesting Mid-level Visual Concepts from Large-Scale Internet Images

no code implementations • CVPR 2013 • Quannan Li, Jiajun Wu, Zhuowen Tu

Obtaining effective mid-level representations has become an increasingly important task in computer vision.

Image Classification

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.