no code implementations • 28 Apr 2024 • Xiaolong Li, Jiawei Mo, Ying Wang, Chethan Parameshwara, Xiaohan Fei, Ashwin Swaminathan, Cj Taylor, Zhuowen Tu, Paolo Favaro, Stefano Soatto
In this paper, we propose an effective two-stage approach named Grounded-Dreamer to generate 3D assets that can accurately follow complex, compositional text prompts while achieving high fidelity by using a pre-trained multi-view diffusion model.
no code implementations • 3 Apr 2024 • Hao Li, Yang Zou, Ying Wang, Orchid Majumder, Yusheng Xie, R. Manmatha, Ashwin Swaminathan, Zhuowen Tu, Stefano Ermon, Stefano Soatto
On the data scaling side, we show the quality and diversity of the training set matters more than simply dataset size.
no code implementations • 18 Mar 2024 • Mengqi Zhang, Yang Fu, Zheng Ding, Sifei Liu, Zhuowen Tu, Xiaolong Wang
In this paper, we propose HOIDiffusion for generating realistic and diverse 3D hand-object interaction data.
no code implementations • 11 Mar 2024 • Haiyang Xu, Yu Lei, Zeyuan Chen, Xiang Zhang, Yue Zhao, Yilin Wang, Zhuowen Tu
We present Bayesian Diffusion Models (BDM), a prediction algorithm that performs effective Bayesian inference by tightly coupling the top-down (prior) information with the bottom-up (data-driven) procedure via joint diffusion processes.
no code implementations • 5 Mar 2024 • Yuan Gao, Kunyu Shi, Pengkai Zhu, Edouard Belval, Oren Nuriel, Srikar Appalaraju, Shabnam Ghadar, Vijay Mahadevan, Zhuowen Tu, Stefano Soatto
We propose Strongly Supervised pre-training with ScreenShots (S4) - a novel pre-training paradigm for Vision-Language Models using data from large-scale web screenshot rendering.
no code implementations • 4 Mar 2024 • Kunyu Shi, Qi Dong, Luis Goncalves, Zhuowen Tu, Stefano Soatto
Sequence-to-sequence vision-language models are showing promise, but their applicability is limited by their inference latency due to their autoregressive way of generating predictions.
no code implementations • 12 Jan 2024 • Shengyi Qian, Weifeng Chen, Min Bai, Xiong Zhou, Zhuowen Tu, Li Erran Li
Affordance grounding refers to the task of finding the area of an object with which one can interact.
no code implementations • 28 Dec 2023 • Zheng Ding, Xuaner Zhang, Zhuowen Tu, Zhihao Xia
We propose a method to adapt a pretrained diffusion model for image restoration by simply adding noise to the input image to be restored and then denoise.
1 code implementation • 6 Dec 2023 • ZiRui Wang, Zhizhou Sha, Zheng Ding, Yilin Wang, Zhuowen Tu
We present TokenCompose, a Latent Diffusion Model for text-to-image generation that achieves enhanced consistency between user-specified text prompts and model-generated images.
1 code implementation • 15 Nov 2023 • Tyler A. Chang, Catherine Arnett, Zhuowen Tu, Benjamin K. Bergen
However, concrete evidence for the effects of multilinguality on language modeling performance in individual languages remains scarce.
no code implementations • 25 Oct 2023 • Yilin Wang, Zeyuan Chen, Liangjun Zhong, Zheng Ding, Zhizhou Sha, Zhuowen Tu
In this paper, we introduce a novel generative model, Diffusion Layout Transformers without Autoencoder (Dolfin), which significantly improves the modeling capability with reduced complexity compared to existing methods.
no code implementations • 20 Sep 2023 • Haodong Duan, Mingze Xu, Bing Shuai, Davide Modolo, Zhuowen Tu, Joseph Tighe, Alessandro Bergamo
It first models the intra-person skeleton dynamics for each skeleton sequence with graph convolutions, and then uses stacked Transformer encoders to capture person interactions that are important for action recognition in general scenarios.
1 code implementation • ICCV 2023 • Zixu Zhao, Jiaze Wang, Max Horn, Yizhuo Ding, Tong He, Zechen Bai, Dominik Zietlow, Carl-Johann Simon-Gabriel, Bing Shuai, Zhuowen Tu, Thomas Brox, Bernt Schiele, Yanwei Fu, Francesco Locatello, Zheng Zhang, Tianjun Xiao
Unsupervised object-centric learning methods allow the partitioning of scenes into entities without additional localization information and are excellent candidates for reducing the annotation burden of multiple-object tracking (MOT) pipelines.
1 code implementation • 29 Aug 2023 • Tyler A. Chang, Zhuowen Tu, Benjamin K. Bergen
We quantify the final surprisal, within-run variability, age of acquisition, forgettability, and cross-run variability of learning curves for individual tokens in context.
1 code implementation • 19 Aug 2023 • WenBo Hu, Yifan Xu, Yi Li, Weiyue Li, Zeyuan Chen, Zhuowen Tu
BLIVA demonstrates significant capability in decoding real-world images, irrespective of text presence.
1 code implementation • 2 Aug 2023 • Zheng Ding, Mengqi Zhang, Jiajun Wu, Zhuowen Tu
Feature collage systematically crops and combines partial features of the neighboring patches to predict the features of a shifted image patch, allowing the seamless generation of the entire image due to the overlap in the patch feature space.
no code implementations • ICCV 2023 • Haofu Liao, Aruni RoyChowdhury, Weijian Li, Ankan Bansal, Yuting Zhang, Zhuowen Tu, Ravi Kumar Satzoda, R. Manmatha, Vijay Mahadevan
We present a new formulation for structured information extraction (SIE) from visually rich documents.
Ranked #2 on Entity Linking on FUNSD
1 code implementation • ICCV 2023 • Xuanlin Li, Yunhao Fang, Minghua Liu, Zhan Ling, Zhuowen Tu, Hao Su
Model distillation, the process of creating smaller, faster models that maintain the performance of larger models, is a promising direction towards the solution.
1 code implementation • 11 May 2023 • Zhaoyang Zhang, Yantao Shen, Kunyu Shi, Zhaowei Cai, Jun Fang, Siqi Deng, Hao Yang, Davide Modolo, Zhuowen Tu, Stefano Soatto
We present a vision-language model whose parameters are jointly trained on all tasks and fully shared among multiple heterogeneous tasks which may interfere with each other, resulting in a single model which we named Musketeer.
1 code implementation • CVPR 2023 • Zheng Ding, Xuaner Zhang, Zhihao Xia, Lars Jebe, Zhuowen Tu, Xiuming Zhang
On a high level, DiffusionRig learns to map simplistic renderings of 3D face models to realistic photos of a given person.
1 code implementation • ICCV 2023 • Hansheng Chen, Jiatao Gu, Anpei Chen, Wei Tian, Zhuowen Tu, Lingjie Liu, Hao Su
3D-aware image synthesis encompasses a variety of tasks, such as scene generation and novel view synthesis from images.
no code implementations • CVPR 2023 • Hao Li, Charless Fowlkes, Hao Yang, Onkar Dabeer, Zhuowen Tu, Stefano Soatto
With thousands of historical training jobs, a recommendation system can be learned to predict the model selection score given the features of the dataset and the model as input.
1 code implementation • ICCV 2023 • Xiang Zhang, Zeyuan Chen, Fangyin Wei, Zhuowen Tu
Performing holistic 3D scene understanding from a single-view observation, involving generating instance shapes and 3D scene segmentation, is a long-standing challenge.
no code implementations • ICCV 2023 • Haodong Duan, Mingze Xu, Bing Shuai, Davide Modolo, Zhuowen Tu, Joseph Tighe, Alessandro Bergamo
It first models the intra-person skeleton dynamics for each skeleton sequence with graph convolutions, and then uses stacked Transformer encoders to capture person interactions that are important for action recognition in the wild.
Ranked #2 on Human Interaction Recognition on NTU RGB+D
1 code implementation • ICCV 2023 • Xin Xu, Tianyi Xiong, Zheng Ding, Zhuowen Tu
We present a new method for open-vocabulary universal image segmentation, which is capable of performing instance, semantic, and panoptic segmentation under a unified framework.
1 code implementation • 19 Oct 2022 • Yifan Xu, Nicklas Hansen, ZiRui Wang, Yung-Chieh Chan, Hao Su, Zhuowen Tu
Reinforcement Learning (RL) algorithms can solve challenging control problems directly from image observations, but they often require millions of environment interactions to do so.
no code implementations • 5 Oct 2022 • Zheng Ding, James Hou, Zhuowen Tu
In this paper, we present Position-to-Structure Attention Transformers (PS-Former), a Transformer-based algorithm for 3D point cloud recognition.
1 code implementation • 30 Sep 2022 • Jun Fang, Mingze Xu, Hao Chen, Bing Shuai, Zhuowen Tu, Joseph Tighe
In this paper, we provide an in-depth study of Stochastic Backpropagation (SBP) when training deep neural networks for standard image classification and object detection tasks.
1 code implementation • 18 Aug 2022 • Zheng Ding, Jieke Wang, Zhuowen Tu
In this paper, we tackle an emerging computer vision task, open-vocabulary universal image segmentation, that aims to perform semantic/instance/panoptic segmentation (background semantic labeling + foreground instance segmentation) for arbitrary categories of text-based descriptions in inference time.
1 code implementation • 11 Aug 2022 • Zhaowei Cai, Avinash Ravichandran, Paolo Favaro, Manchen Wang, Davide Modolo, Rahul Bhotika, Zhuowen Tu, Stefano Soatto
We study semi-supervised learning (SSL) for vision transformers (ViT), an under-explored topic despite the wide adoption of the ViT architectures to different tasks.
1 code implementation • 22 May 2022 • Tyler A. Chang, Zhuowen Tu, Benjamin K. Bergen
The subspace means differ along language-sensitive axes that are relatively stable throughout middle layers, and these axes encode information such as token vocabularies.
1 code implementation • 12 May 2022 • Yue Zhao, Yantao Shen, Yuanjun Xiong, Shuo Yang, Wei Xia, Zhuowen Tu, Bernt Schiele, Stefano Soatto
Based on the observation, we present a method, called Ensemble Logit Difference Inhibition (ELODI), to train a classification system that achieves paragon performance in both error rate and NFR, at the inference cost of a single model.
no code implementations • 12 Apr 2022 • Zhaowei Cai, Gukyeong Kwon, Avinash Ravichandran, Erhan Bas, Zhuowen Tu, Rahul Bhotika, Stefano Soatto
In this paper, we study the challenging instance-wise vision-language tasks, where the free-form language is required to align with the objects instead of the whole image.
1 code implementation • CVPR 2022 • Xiang Zhang, Yongwen Su, Subarna Tripathi, Zhuowen Tu
In this paper, we present TExt Spotting TRansformers (TESTR), a generic end-to-end text spotting framework using Transformers for text detection and recognition in the wild.
Ranked #6 on Text Spotting on ICDAR 2015
no code implementations • CVPR 2022 • Jiarui Cai, Mingze Xu, Wei Li, Yuanjun Xiong, Wei Xia, Zhuowen Tu, Stefano Soatto
We propose an online tracking algorithm that performs the object detection and data association under a common framework, capable of linking objects after a long time span.
no code implementations • 6 Jan 2022 • Pengkai Zhu, Zhaowei Cai, Yuanjun Xiong, Zhuowen Tu, Luis Goncalves, Vijay Mahadevan, Stefano Soatto
We present Contrastive Neighborhood Alignment (CNA), a manifold learning approach to maintain the topology of learned features whereby data points that are mapped to nearby representations by the source (teacher) model are also mapped to neighbors by the target (student) model.
no code implementations • CVPR 2022 • Justin Lazarow, Weijian Xu, Zhuowen Tu
In this paper, we present an end-to-end instance segmentation method that regresses a polygonal boundary for each object instance.
no code implementations • 4 Nov 2021 • Sainan Liu, Vincent Nguyen, Yuan Gao, Subarna Tripathi, Zhuowen Tu
Our proposed panoptic 3D parsing framework points to a promising direction in computer vision.
3 code implementations • ICLR 2022 • Kwonjoon Lee, Huiwen Chang, Lu Jiang, Han Zhang, Zhuowen Tu, Ce Liu
Recently, Vision Transformers (ViTs) have shown competitive performance on image recognition while requiring less vision-specific inductive biases.
Ranked #68 on Image Generation on CIFAR-10
2 code implementations • NeurIPS 2021 • Mingze Xu, Yuanjun Xiong, Hao Chen, Xinyu Li, Wei Xia, Zhuowen Tu, Stefano Soatto
We present Long Short-term TRansformer (LSTR), a temporal modeling algorithm for online action detection, which employs a long- and short-term memory mechanism to model prolonged sequence data.
Ranked #3 on Online Action Detection on TVSeries
1 code implementation • ACL 2021 • Tyler A. Chang, Yifan Xu, Weijian Xu, Zhuowen Tu
In this paper, we detail the relationship between convolutions and self-attention in natural language tasks.
no code implementations • CVPR 2021 • Rahul Duggal, Hao Zhou, Shuo Yang, Yuanjun Xiong, Wei Xia, Zhuowen Tu, Stefano Soatto
Existing systems use the same embedding model to compute representations (embeddings) for the query and gallery images.
no code implementations • ICCV 2021 • Qi Dong, Zhuowen Tu, Haofu Liao, Yuting Zhang, Vijay Mahadevan, Stefano Soatto
Computer vision applications such as visual relationship detection and human object interaction can be formulated as a composite (structured) set detection problem in which both the parts (subject, object, and predicate) and the sum (triplet as a whole) are to be detected in a hierarchical fashion.
2 code implementations • CVPR 2021 • Ke Li, Shijie Wang, Xiang Zhang, Yifan Xu, Weijian Xu, Zhuowen Tu
Here we utilize the encoder-decoder structure in Transformers to perform regression-based person and keypoint detection that is general-purpose and requires less heuristic design compared with the existing approaches.
9 code implementations • ICCV 2021 • Weijian Xu, Yifan Xu, Tyler Chang, Zhuowen Tu
In this paper, we present Co-scale conv-attentional image Transformers (CoaT), a Transformer-based image classifier equipped with co-scale and conv-attentional mechanisms.
1 code implementation • CVPR 2021 • Zhaowei Cai, Avinash Ravichandran, Subhransu Maji, Charless Fowlkes, Zhuowen Tu, Stefano Soatto
We present a plug-in replacement for batch normalization (BN) called exponential moving average normalization (EMAN), which improves the performance of existing student-teacher based self- and semi-supervised learning techniques.
Self-Supervised Learning Semi-Supervised Image Classification
2 code implementations • CVPR 2021 • Yifan Xu, Weijian Xu, David Cheung, Zhuowen Tu
In this paper, we present a joint end-to-end line segment detection algorithm using Transformers that is post-processing and heuristics-guided intermediate processing (edge/junction/region detection) free.
Ranked #1 on Line Segment Detection on York Urban Dataset (FH metric)
1 code implementation • ICLR 2021 • Weijian Xu, Yifan Xu, Huaijin Wang, Zhuowen Tu
The success of deep convolutional neural networks builds on top of the learning of effective convolution operations, capturing a hierarchy of structured features via filtering, activation, and pooling.
Ranked #15 on Few-Shot Image Classification on FC100 5-way (5-shot)
no code implementations • CVPR 2021 • Gaurav Parmar, Dacheng Li, Kwonjoon Lee, Zhuowen Tu
Our model, named dual contradistinctive generative autoencoder (DC-VAE), integrates an instance-level discriminative loss (maintaining the instance-level fidelity for the reconstruction/synthesis) with a set-level adversarial loss (encouraging the set-level fidelity for there construction/synthesis), both being contradistinctive.
Ranked #2 on Image Generation on LSUN Bedroom 128 x 128
no code implementations • ECCV 2020 • Shanjiaoyang Huang, Weiqi Peng, Zhiwei Jia, Zhuowen Tu
One-pixel signature is a general representation that can be used to characterize CNN models beyond backdoor detection.
no code implementations • CVPR 2020 • Zheng Ding, Yifan Xu, Weijian Xu, Gaurav Parmar, Yang Yang, Max Welling, Zhuowen Tu
We propose an algorithm, guided variational autoencoder (Guided-VAE), that is able to learn a controllable generative model by performing latent representation disentanglement learning.
no code implementations • 13 Oct 2019 • Yifan Xu, Kening Zhang, Haoyu Dong, Yuezhou Sun, Wenlong Zhao, Zhuowen Tu
Exposure bias describes the phenomenon that a language model trained under the teacher forcing schema may perform poorly at the inference stage when its predictions are conditioned on its previous predictions unseen from the training corpus.
no code implementations • 13 Oct 2019 • Yifan Xu, Lu Dai, Udaikaran Singh, Kening Zhang, Zhuowen Tu
Neural inductive program synthesis is a task generating instructions that can produce desired outputs from given inputs.
no code implementations • ICLR 2020 • Siyang Wang, Justin Lazarow, Kwonjoon Lee, Zhuowen Tu
We tackle the problem of modeling sequential visual phenomena.
1 code implementation • CVPR 2020 • Justin Lazarow, Kwonjoon Lee, Kunyu Shi, Zhuowen Tu
Panoptic segmentation requires segments of both "things" (countable object instances) and "stuff" (uncountable and amorphous regions) within a single output.
Ranked #22 on Panoptic Segmentation on COCO test-dev
no code implementations • ICLR 2019 • Jeng-Hau Lin, Yunfan Yang, Rajesh K. Gupta, Zhuowen Tu
Memory and computation efficient deep learning architectures are crucial to the continued proliferation of machine learning capabilities to new platforms and systems.
no code implementations • CVPR 2018 • Saining Xie, Sainan Liu, Zeyu Chen, Zhuowen Tu
We tackle the problem of point cloud recognition.
no code implementations • 19 Mar 2018 • Jeng-Hau Lin, Yunfan Yang, Rajesh Gupta, Zhuowen Tu
In this paper, we tackle the problem us- ing a strategy different from the existing literature by proposing local binary pattern networks or LBPNet, that is able to learn and perform binary operations in an end-to-end fashion.
1 code implementation • ECCV 2018 • Saining Xie, Chen Sun, Jonathan Huang, Zhuowen Tu, Kevin Murphy
Despite the steady progress in video analysis led by the adoption of convolutional neural networks (CNNs), the relative improvement has been less drastic as that in 2D static image classification.
Ranked #27 on Action Recognition on UCF101 (using extra training data)
no code implementations • 6 Dec 2017 • Zhiwei Jia, Haoshen Hong, Siyang Wang, Kwonjoon Lee, Zhuowen Tu
We study the intrinsic transformation of feature maps across convolutional network layers with explicit top-down control.
no code implementations • 26 Nov 2017 • Jameson Merkow, Robert Lufkin, Kim Nguyen, Stefano Soatto, Zhuowen Tu, Andrea Vedaldi
Thus, DeepRadiologyNet enables significant reduction in the workload of human radiologists by automatically filtering studies and reporting on the high-confidence ones at an operating point well below the literal error rate for US Board Certified radiologists, estimated at 0. 82%.
1 code implementation • CVPR 2018 • Kwonjoon Lee, Weijian Xu, Fan Fan, Zhuowen Tu
We present Wasserstein introspective neural networks (WINN) that are both a generator and a discriminator within a single model.
no code implementations • ICCV 2017 • Justin Lazarow, Long Jin, Zhuowen Tu
We study unsupervised learning by developing a generative model built from progressively learned deep convolutional neural networks.
no code implementations • 15 Jul 2017 • Jeng-Hau Lin, Tianwei Xing, Ritchie Zhao, Zhiru Zhang, Mani Srivastava, Zhuowen Tu, Rajesh K. Gupta
State-of-the-art convolutional neural networks are enormously costly in both compute and memory, demanding massively parallel GPUs for execution.
no code implementations • 25 Apr 2017 • Justin Lazarow, Long Jin, Zhuowen Tu
We study unsupervised learning by developing introspective generative modeling (IGM) that attains a generator using progressively learned deep convolutional neural networks.
no code implementations • NeurIPS 2017 • Long Jin, Justin Lazarow, Zhuowen Tu
We propose introspective convolutional networks (ICN) that emphasize the importance of having convolutional neural networks empowered with generative capabilities.
no code implementations • 28 Nov 2016 • Long Jin, Zeyu Chen, Zhuowen Tu
Instance segmentation has attracted recent attention in computer vision and existing methods in this domain mostly have an object detection stage.
4 code implementations • 23 Nov 2016 • Liming Zhao, Jingdong Wang, Xi Li, Zhuowen Tu, Wen-Jun Zeng
A deep residual network, built by stacking a sequence of residual blocks, is easy to train, because identity mappings skip residual branches and thus improve information flow.
58 code implementations • CVPR 2017 • Saining Xie, Ross Girshick, Piotr Dollár, Zhuowen Tu, Kaiming He
Our simple design results in a homogeneous, multi-branch architecture that has only a few hyper-parameters to set.
Ranked #3 on Image Classification on GasHisSDB
4 code implementations • CVPR 2017 • Qibin Hou, Ming-Ming Cheng, Xiao-Wei Hu, Ali Borji, Zhuowen Tu, Philip Torr
Recent progress on saliency detection is substantial, benefiting mostly from the explosive development of Convolutional Neural Networks (CNNs).
Ranked #4 on RGB Salient Object Detection on SBU
no code implementations • 31 Jul 2016 • Peng Tang, Xinggang Wang, Baoguang Shi, Xiang Bai, Wenyu Liu, Zhuowen Tu
Our proposed FisherNet combines convolutional neural network training and Fisher Vector encoding in a single end-to-end structure.
1 code implementation • 26 May 2016 • Jameson Merkow, David Kriegman, Alison Marsden, Zhuowen Tu
In this work, we present a novel 3D-Convolutional Neural Network (CNN) architecture called I2I-3D that predicts boundary location in volumetric data.
no code implementations • 23 Nov 2015 • Patrick W. Gallagher, Shuai Tang, Zhuowen Tu
Top-down information plays a central role in human perception, but plays relatively little role in many current state-of-the-art deep networks, such as Convolutional Neural Networks (CNNs).
no code implementations • 23 Nov 2015 • Saining Xie, Xun Huang, Zhuowen Tu
Current practice in convolutional neural networks (CNN) remains largely bottom-up and the role of top-down process in CNN for pattern analysis and visual inference is not very clear.
2 code implementations • 30 Sep 2015 • Chen-Yu Lee, Patrick W. Gallagher, Zhuowen Tu
We seek to improve deep neural networks by generalizing the pooling operations that play a central role in current architectures.
Ranked #18 on Image Classification on MNIST
1 code implementation • 11 May 2015 • Liwei Wang, Chen-Yu Lee, Zhuowen Tu, Svetlana Lazebnik
One of the most promising ways of improving the performance of deep convolutional neural networks is by increasing the number of convolutional layers.
17 code implementations • ICCV 2015 • Saining Xie, Zhuowen Tu
We develop a new edge detection algorithm that tackles two important issues in this long-standing vision problem: (1) holistic image training and prediction; and (2) multi-scale and multi-level feature learning.
1 code implementation • 18 Sep 2014 • Chen-Yu Lee, Saining Xie, Patrick Gallagher, Zhengyou Zhang, Zhuowen Tu
Our proposed deeply-supervised nets (DSN) method simultaneously minimizes classification error while making the learning process of hidden layers direct and transparent.
Ranked #24 on Image Classification on MNIST
no code implementations • CVPR 2014 • Jiajun Wu, Yibiao Zhao, Jun-Yan Zhu, Siwei Luo, Zhuowen Tu
Interactive segmentation, in which a user provides a bounding box to an object of interest for image segmentation, has been applied to a variety of applications in image editing, crowdsourcing, computer vision, and medical imaging.
no code implementations • 27 May 2014 • Zhuowen Tu, Piotr Dollar, Ying-Nian Wu
Many the solutions to the problem require to perform logic operations such as `and', `or', and `not'.
no code implementations • 30 Jul 2013 • Jingdong Wang, Jing Wang, Gang Zeng, Zhuowen Tu, Rui Gan, Shipeng Li
The $k$-NN graph has played a central role in increasingly popular data-driven techniques for various learning and vision tasks; yet, finding an efficient and effective way to construct $k$-NN graphs remains a challenge, especially for large-scale high-dimensional data.
no code implementations • CVPR 2013 • Bo Wang, Zhuowen Tu
With the increasing availability of high dimensional data and demand in sophisticated data analysis algorithms, manifold learning becomes a critical technique to perform dimensionality reduction, unraveling the intrinsic data structure.
no code implementations • CVPR 2013 • Jiayi Ma, Ji Zhao, Jinwen Tian, Zhuowen Tu, Alan L. Yuille
In the second step, we estimate the transformation using a robust estimator called L 2 E. This is the main novelty of our approach and it enables us to deal with the noise and outliers which arise in the correspondence step.
no code implementations • CVPR 2013 • Quannan Li, Jiajun Wu, Zhuowen Tu
Obtaining effective mid-level representations has become an increasingly important task in computer vision.