1 code implementation • 27 Feb 2024 • Chengcheng Wang, Zhiwei Hao, Yehui Tang, Jianyuan Guo, Yujie Yang, Kai Han, Yunhe Wang
In this paper, we propose the SAM-DiffSR model, which can utilize the fine-grained structure information from SAM in the process of sampling noise to improve the image quality without additional computational cost during inference.
1 code implementation • 7 Feb 2024 • Jianyuan Guo, Zhiwei Hao, Chengcheng Wang, Yehui Tang, Han Wu, Han Hu, Kai Han, Chang Xu
Training general-purpose vision models on purely sequential visual data, eschewing linguistic inputs, has heralded a new frontier in visual understanding.
1 code implementation • 6 Feb 2024 • Jianyuan Guo, Hanting Chen, Chengcheng Wang, Kai Han, Chang Xu, Yunhe Wang
Recent advancements in large language models have sparked interest in their extraordinary and near-superhuman capabilities, leading researchers to explore methods for evaluating and optimizing these abilities, which is called superalignment.
no code implementations • 5 Feb 2024 • Yehui Tang, Yunhe Wang, Jianyuan Guo, Zhijun Tu, Kai Han, Hailin Hu, DaCheng Tao
Model compression methods reduce the memory and computational cost of Transformer, which is a necessary step to implement large language/vision models on practical devices.
no code implementations • 27 Dec 2023 • Yunhe Wang, Hanting Chen, Yehui Tang, Tianyu Guo, Kai Han, Ying Nie, Xutao Wang, Hailin Hu, Zheyuan Bai, Yun Wang, Fangcheng Liu, Zhicheng Liu, Jianyuan Guo, Sinan Zeng, Yinchen Zhang, Qinghua Xu, Qun Liu, Jun Yao, Chao Xu, DaCheng Tao
We then demonstrate that the proposed approach is significantly effective for enhancing the model nonlinearity through carefully designed ablations; thus, we present a new efficient model architecture for establishing modern, namely, PanGu-$\pi$.
1 code implementation • NeurIPS 2023 • Zhiwei Hao, Jianyuan Guo, Kai Han, Yehui Tang, Han Hu, Yunhe Wang, Chang Xu
To tackle the challenge in distilling heterogeneous models, we propose a simple yet effective one-for-all KD framework called OFA-KD, which significantly improves the distillation performance between heterogeneous architectures.
3 code implementations • NeurIPS 2023 • Chengcheng Wang, wei he, Ying Nie, Jianyuan Guo, Chuanjian Liu, Kai Han, Yunhe Wang
In the past years, YOLO-series models have emerged as the leading approaches in the area of real-time object detection.
no code implementations • 26 Jun 2023 • Kai Han, Yunhe Wang, Jianyuan Guo, Enhua Wu
In the language domain, LLaMA-1B enhanced with ParameterNet achieves 2\% higher accuracy over vanilla LLaMA.
1 code implementation • 25 May 2023 • Zhiwei Hao, Jianyuan Guo, Kai Han, Han Hu, Chang Xu, Yunhe Wang
The tremendous success of large models trained on extensive datasets demonstrates that scale is a key ingredient in achieving superior results.
4 code implementations • NeurIPS 2023 • Hanting Chen, Yunhe Wang, Jianyuan Guo, DaCheng Tao
In this study, we introduce VanillaNet, a neural network architecture that embraces elegance in design.
1 code implementation • CVPR 2023 • Haoqing Wang, Yehui Tang, Yunhe Wang, Jianyuan Guo, Zhi-Hong Deng, Kai Han
The lower layers are not explicitly guided and the interaction among their patches is only used for calculating new activations.
1 code implementation • 13 Dec 2022 • Jianyuan Guo, Kai Han, Han Wu, Yehui Tang, Yunhe Wang, Chang Xu
This paper presents FastMIM, a simple and generic framework for expediting masked image modeling with the following two steps: (i) pre-training vision backbones with low-resolution input images; and (ii) reconstructing Histograms of Oriented Gradients (HOG) feature instead of original RGB values of the input images.
15 code implementations • 23 Nov 2022 • Yehui Tang, Kai Han, Jianyuan Guo, Chang Xu, Chao Xu, Yunhe Wang
The convolutional operation can only capture local information in a window region, which prevents performance from being further improved.
no code implementations • 2 Sep 2022 • Han Wu, Jie Yin, Bala Rajaratnam, Jianyuan Guo
By jointly capturing three levels of relational information (entity-level, triplet-level and context-level), HiRe can effectively learn and refine the meta representation of few-shot relations, and consequently generalize very well to new unseen relations.
11 code implementations • 1 Jun 2022 • Kai Han, Yunhe Wang, Jianyuan Guo, Yehui Tang, Enhua Wu
In this paper, we propose to represent the image as a graph structure and introduce a new Vision GNN (ViG) architecture to extract graph-level feature for visual tasks.
Ranked #365 on Image Classification on ImageNet
4 code implementations • CVPR 2022 • Wenshuo Li, Hanting Chen, Jianyuan Guo, Ziyang Zhang, Yunhe Wang
However, due to the simplicity of their structures, the performance highly depends on the local features communication machenism.
8 code implementations • 10 Jan 2022 • Kai Han, Yunhe Wang, Chang Xu, Jianyuan Guo, Chunjing Xu, Enhua Wu, Qi Tian
The proposed C-Ghost module can be taken as a plug-and-play component to upgrade existing convolutional neural networks.
1 code implementation • 4 Jan 2022 • Kai Han, Jianyuan Guo, Yehui Tang, Yunhe Wang
We hope this new baseline will be helpful to the further research and application of vision transformer.
10 code implementations • CVPR 2022 • Yehui Tang, Kai Han, Jianyuan Guo, Chang Xu, Yanxi Li, Chao Xu, Yunhe Wang
To dynamically aggregate tokens, we propose to represent each token as a wave function with two parts, amplitude and phase.
10 code implementations • CVPR 2022 • Jianyuan Guo, Yehui Tang, Kai Han, Xinghao Chen, Han Wu, Chao Xu, Chang Xu, Yunhe Wang
Previous vision MLPs such as MLP-Mixer and ResMLP accept linearly flattened image patches as input, making them inflexible for different input sizes and hard to capture spatial information.
14 code implementations • CVPR 2022 • Jianyuan Guo, Kai Han, Han Wu, Yehui Tang, Xinghao Chen, Yunhe Wang, Chang Xu
Vision transformers have been successfully applied to image recognition tasks due to their ability to capture long-range dependencies within an image.
1 code implementation • 3 Jul 2021 • Zhiwei Hao, Jianyuan Guo, Ding Jia, Kai Han, Yehui Tang, Chao Zhang, Han Hu, Yunhe Wang
Specifically, we train a tiny student model to match a pre-trained teacher model in the patch-level manifold space.
no code implementations • CVPR 2021 • Jianyuan Guo, Kai Han, Han Wu, Chao Zhang, Xinghao Chen, Chunjing Xu, Chang Xu, Yunhe Wang
In this paper, we present a positive-unlabeled learning based scheme to expand training data by purifying valuable images from massive unlabeled ones, where the original training data are viewed as positive data and the unlabeled images in the wild are unlabeled data.
no code implementations • CVPR 2022 • Yehui Tang, Kai Han, Yunhe Wang, Chang Xu, Jianyuan Guo, Chao Xu, DaCheng Tao
We first identify the effective patches in the last layer and then use them to guide the patch selection process of previous layers.
Ranked #8 on Efficient ViTs on ImageNet-1K (with DeiT-T)
1 code implementation • CVPR 2021 • Jianyuan Guo, Kai Han, Yunhe Wang, Han Wu, Xinghao Chen, Chunjing Xu, Chang Xu
To this end, we present a novel distillation algorithm via decoupled features (DeFeat) for learning a better student detector.
12 code implementations • NeurIPS 2021 • Kai Han, An Xiao, Enhua Wu, Jianyuan Guo, Chunjing Xu, Yunhe Wang
In this paper, we point out that the attention inside these local patches are also essential for building visual transformers with high performance and we explore a new architecture, namely, Transformer iN Transformer (TNT).
no code implementations • 23 Dec 2020 • Kai Han, Yunhe Wang, Hanting Chen, Xinghao Chen, Jianyuan Guo, Zhenhua Liu, Yehui Tang, An Xiao, Chunjing Xu, Yixing Xu, Zhaohui Yang, Yiman Zhang, DaCheng Tao
Transformer, first applied to the field of natural language processing, is a type of deep neural network mainly based on the self-attention mechanism.
6 code implementations • CVPR 2021 • Zhaohui Yang, Yunhe Wang, Xinghao Chen, Jianyuan Guo, Wei zhang, Chao Xu, Chunjing Xu, DaCheng Tao, Chang Xu
To achieve an extremely fast NAS while preserving the high accuracy, we propose to identify the vital blocks and make them the priority in the architecture search.
1 code implementation • CVPR 2020 • Jianyuan Guo, Kai Han, Yunhe Wang, Chao Zhang, Zhaohui Yang, Han Wu, Xinghao Chen, Chang Xu
To this end, we propose a hierarchical trinity search framework to simultaneously discover efficient architectures for all components (i. e. backbone, neck, and head) of object detector in an end-to-end manner.
34 code implementations • CVPR 2020 • Kai Han, Yunhe Wang, Qi Tian, Jianyuan Guo, Chunjing Xu, Chang Xu
Deploying convolutional neural networks (CNNs) on embedded devices is difficult due to the limited memory and computation resources.
Ranked #867 on Image Classification on ImageNet
1 code implementation • ICCV 2019 • Jianyuan Guo, Yuhui Yuan, Lang Huang, Chao Zhang, Jinge Yao, Kai Han
On the other hand, there still exist many useful contextual cues that do not fall into the scope of predefined human parts or attributes.
Ranked #59 on Person Re-Identification on DukeMTMC-reID
6 code implementations • 29 Jul 2019 • Lang Huang, Yuhui Yuan, Jianyuan Guo, Chao Zhang, Xilin Chen, Jingdong Wang
There are two successive attention modules each estimating a sparse affinity matrix.
1 code implementation • 2 Jan 2019 • Kai Han, Jianyuan Guo, Chao Zhang, Mingjian Zhu
Based on the considerations above, we propose a novel Attribute-Aware Attention Model ($A^3M$), which can learn local attribute representation and global category representation simultaneously in an end-to-end manner.
Ranked #4 on Fine-Grained Image Classification on CompCars
8 code implementations • 4 Sep 2018 • Yuhui Yuan, Lang Huang, Jianyuan Guo, Chao Zhang, Xilin Chen, Jingdong Wang
To capture richer context information, we further combine our interlaced sparse self-attention scheme with the conventional multi-scale context schemes including pyramid pooling~\citep{zhao2017pyramid} and atrous spatial pyramid pooling~\citep{chen2018deeplab}.
Ranked #9 on Semantic Segmentation on Trans10K