no code implementations • 11 Apr 2024 • Zhiliang Peng, Yicheng Wang, Zhengwu Yuan, Xingsheng Wang
This paper introduces an innovative parameter extraction method for BSIM-CMG compact models, seamlessly integrating curve feature extraction and machine learning techniques.
1 code implementation • 4 Oct 2023 • Xichen Pan, Li Dong, Shaohan Huang, Zhiliang Peng, Wenhu Chen, Furu Wei
These limitations keep them far from the ultimate goal of "image as a foreign language in image generation."
2 code implementations • 26 Jun 2023 • Zhiliang Peng, Wenhui Wang, Li Dong, Yaru Hao, Shaohan Huang, Shuming Ma, Furu Wei
We introduce Kosmos-2, a Multimodal Large Language Model (MLLM), enabling new capabilities of perceiving object descriptions (e. g., bounding boxes) and grounding text to the visual world.
Ranked #11 on Visual Question Answering on ViP-Bench
1 code implementation • CVPR 2023 • Wei Huang, Zhiliang Peng, Li Dong, Furu Wei, Jianbin Jiao, Qixiang Ye
Lightweight ViT models limited by the model capacity, however, benefit little from those pre-training mechanisms.
no code implementations • CVPR 2023 • Wenhui Wang, Hangbo Bao, Li Dong, Johan Bjorck, Zhiliang Peng, Qiang Liu, Kriti Aggarwal, Owais Khan Mohammed, Saksham Singhal, Subhojit Som, Furu Wei
A big convergence of language, vision, and multimodal pretraining is emerging.
1 code implementation • 19 Oct 2022 • Zhiliang Peng, Li Dong, Hangbo Bao, Qixiang Ye, Furu Wei
Masked image modeling has demonstrated great potential to eliminate the label-hungry problem of training large-scale vision Transformers, achieving impressive performance on various downstream tasks.
4 code implementations • 12 Oct 2022 • Hongyu Wang, Shuming Ma, Shaohan Huang, Li Dong, Wenhui Wang, Zhiliang Peng, Yu Wu, Payal Bajaj, Saksham Singhal, Alon Benhaim, Barun Patra, Zhun Liu, Vishrav Chaudhary, Xia Song, Furu Wei
A big convergence of model architectures across language, vision, speech, and multimodal is emerging.
2 code implementations • 22 Aug 2022 • Wenhui Wang, Hangbo Bao, Li Dong, Johan Bjorck, Zhiliang Peng, Qiang Liu, Kriti Aggarwal, Owais Khan Mohammed, Saksham Singhal, Subhojit Som, Furu Wei
A big convergence of language, vision, and multimodal pretraining is emerging.
Ranked #1 on Visual Reasoning on NLVR2 Test
2 code implementations • 12 Aug 2022 • Zhiliang Peng, Li Dong, Hangbo Bao, Qixiang Ye, Furu Wei
The large-size BEiT v2 obtains 87. 3% top-1 accuracy for ImageNet-1K (224 size) fine-tuning, and 56. 7% mIoU on ADE20K for semantic segmentation.
Ranked #27 on Self-Supervised Image Classification on ImageNet
3 code implementations • ICCV 2023 • Feng Liu, Xiaosong Zhang, Zhiliang Peng, Zonghao Guo, Fang Wan, Xiangyang Ji, Qixiang Ye
Except for the backbone networks, however, other components such as the detector head and the feature pyramid network (FPN) remain trained from scratch, which hinders fully tapping the potential of representation models.
Ranked #3 on Few-Shot Object Detection on MS-COCO (30-shot)
1 code implementation • 6 Oct 2021 • Zhiliang Peng, Wei Huang, Zonghao Guo, Xiaosong Zhang, Jianbin Jiao, Qixiang Ye
We propose to jointly optimize empirical risks of the unbalanced and balanced domains and approximate their domain divergence by intra-class and inter-class distances, with the aim to adapt models trained on the long-tailed distribution to general distributions in an interpretable way.
4 code implementations • ICCV 2021 • Zhiliang Peng, Wei Huang, Shanzhi Gu, Lingxi Xie, YaoWei Wang, Jianbin Jiao, Qixiang Ye
Within Convolutional Neural Network (CNN), the convolution operations are good at extracting local features but experience difficulty to capture global representations.
Ranked #325 on Image Classification on ImageNet
2 code implementations • ICCV 2021 • Wei Gao, Fang Wan, Xingjia Pan, Zhiliang Peng, Qi Tian, Zhenjun Han, Bolei Zhou, Qixiang Ye
TS-CAM finally couples the patch tokens with the semantic-agnostic attention map to achieve semantic-aware localization.