1 code implementation • 22 Apr 2024 • Yuying Ge, Sijie Zhao, Jinguo Zhu, Yixiao Ge, Kun Yi, Lin Song, Chen Li, Xiaohan Ding, Ying Shan
We hope that our work will inspire future research into what can be achieved by versatile multimodal foundation models in real-world applications.
no code implementations • 1 Mar 2024 • Xiaohan Ding, Buse Carik, Uma Sushmitha Gunturi, Valerie Reyna, Eugenia H. Rho
We introduce a multi-step reasoning framework using prompt-based LLMs to examine the relationship between social media language patterns and trends in national health outcomes.
1 code implementation • 5 Feb 2024 • Yiyuan Zhang, Yuhao Kang, Zhixin Zhang, Xiaohan Ding, Sanyuan Zhao, Xiangyu Yue
We introduce $\textit{InteractiveVideo}$, a user-centric framework for video generation.
1 code implementation • 25 Jan 2024 • Yiyuan Zhang, Xiaohan Ding, Kaixiong Gong, Yixiao Ge, Ying Shan, Xiangyu Yue
We propose to improve transformers of a specific modality with irrelevant data from other modalities, e. g., improve an ImageNet model with audio or point cloud datasets.
1 code implementation • 14 Dec 2023 • Jinguo Zhu, Xiaohan Ding, Yixiao Ge, Yuying Ge, Sijie Zhao, Hengshuang Zhao, Xiaohua Wang, Ying Shan
In combination with the existing text tokenizer and detokenizer, this framework allows for the encoding of interleaved image-text data into a multimodal sequence, which can subsequently be fed into the transformer model.
1 code implementation • 6 Dec 2023 • Zhixin Zhang, Yiyuan Zhang, Xiaohan Ding, Fusheng Jin, Xiangyu Yue
In our work, we propose GeMap ($\textbf{Ge}$ometry $\textbf{Map}$), which end-to-end learns Euclidean shapes and relations of map instances beyond basic perception.
2 code implementations • 27 Nov 2023 • Xiaohan Ding, Yiyuan Zhang, Yixiao Ge, Sijie Zhao, Lin Song, Xiangyu Yue, Ying Shan
1) We propose four architectural guidelines for designing large-kernel ConvNets, the core of which is to exploit the essential characteristics of large kernels that distinguish them from small kernels - they can see wide without going deep.
Ranked #1 on Object Detection on COCO 2017 (mAP metric)
1 code implementation • 26 Nov 2023 • Chongjian Ge, Xiaohan Ding, Zhan Tong, Li Yuan, Jiangliu Wang, Yibing Song, Ping Luo
The attention map is computed based on the mixtures of tokens and group proxies and used to re-combine the tokens and groups in Value.
1 code implementation • 16 Oct 2023 • Zhicheng Cai, Xiaohan Ding, Qiu Shen, Xun Cao
We propose Re-parameterized Refocusing Convolution (RefConv) as a replacement for regular convolutional layers, which is a plug-and-play module to improve the performance without any inference costs.
1 code implementation • 16 Oct 2023 • Yiyuan Zhang, Kaixiong Gong, Xiaohan Ding, Kaipeng Zhang, Fangrui Lv, Kurt Keutzer, Xiangyu Yue
We propose $\textbf{UniDG}$, a novel and $\textbf{Uni}$fied framework for $\textbf{D}$omain $\textbf{G}$eneralization that is capable of significantly enhancing the out-of-distribution generalization performance of foundation models regardless of their architectures.
Ranked #1 on Domain Generalization on TerraIncognita
no code implementations • 12 Jun 2023 • Sijie Zhao, Yixiao Ge, Zhongang Qi, Lin Song, Xiaohan Ding, Zehua Xie, Ying Shan
Therefore, we propose StickerCLIP as a benchmark model on the Sticker820K dataset.
no code implementations • 12 Jun 2023 • Shiming Chen, Wenjin Hou, Ziming Hong, Xiaohan Ding, Yibing Song, Xinge You, Tongliang Liu, Kun Zhang
After alignment, synthesized sample features from unseen classes are closer to the real sample features and benefit DSP to improve existing generative ZSL methods by 8. 5\%, 8. 0\%, and 9. 7\% on the standard CUB, SUN AWA2 datasets, the significant performance improvement indicates that evolving semantic prototype explores a virgin field in ZSL.
1 code implementation • 20 May 2023 • Guangzhi Wang, Yixiao Ge, Xiaohan Ding, Mohan Kankanhalli, Ying Shan
In our benchmark, which is curated to evaluate MLLMs visual semantic understanding and fine-grained perception capabilities, we discussed different visual tokenizers pre-trained with dominant methods (i. e., DeiT, CLIP, MAE, DINO), and observe that: i) Fully/weakly supervised models capture more semantics than self-supervised models, but the gap is narrowed by scaling up the pre-training dataset.
no code implementations • 1 Mar 2023 • Uma Gunturi, Xiaohan Ding, Eugenia H. Rho
By making the classification process explainable, ToxVis provides a valuable tool for understanding the nuances of hateful content and supporting more effective content moderation.
no code implementations • 20 Jan 2023 • Xiaohan Ding, Mike Horning, Eugenia H. Rho
By analyzing a decade's worth of closed captions (2 million speaker turns) from CNN and Fox News along with topically corresponding discourse from Twitter, we provide a novel framework for measuring semantic polarization between America's two major broadcast networks to demonstrate how semantic polarization between these outlets has evolved (Study 1), peaked (Study 2) and influenced partisan discussions on Twitter (Study 3) across the last decade.
1 code implementation • 30 May 2022 • Xiaohan Ding, Honghao Chen, Xiangyu Zhang, Kaiqi Huang, Jungong Han, Guiguang Ding
For the extreme simplicity of model structure, we focus on a VGG-style plain model and showcase that such a simple model trained with a RepOptimizer, which is referred to as RepOpt-VGG, performs on par with or better than the recent well-designed models.
7 code implementations • CVPR 2022 • Xiaohan Ding, Xiangyu Zhang, Yizhuang Zhou, Jungong Han, Guiguang Ding, Jian Sun
We revisit large kernel design in modern convolutional neural networks (CNNs).
Ranked #75 on Image Classification on ImageNet
no code implementations • 10 Jan 2022 • Chhavi Choudhury, Ankur Gandhe, Xiaohan Ding, Ivan Bulyko
In this work, we explore a contextual biasing approach using likelihood-ratio that leverages text data sources to adapt RNN-T model to new domains and entities.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
4 code implementations • CVPR 2022 • Xiaohan Ding, Honghao Chen, Xiangyu Zhang, Jungong Han, Guiguang Ding
Our results reveal that 1) Locality Injection is a general methodology for MLP models; 2) RepMLPNet has favorable accuracy-efficiency trade-off compared to the other MLPs; 3) RepMLPNet is the first MLP that seamlessly transfer to Cityscapes semantic segmentation.
Ranked #61 on Semantic Segmentation on Cityscapes val
2 code implementations • 30 Jul 2021 • Xiaohan Ding, Tianxiang Hao, Jungong Han, Yuchen Guo, Guiguang Ding
The existence of redundancy in Convolutional Neural Networks (CNNs) enables us to remove some filters/channels with acceptable performance drops.
10 code implementations • 5 May 2021 • Xiaohan Ding, Chunlong Xia, Xiangyu Zhang, Xiaojie Chu, Jungong Han, Guiguang Ding
We propose RepMLP, a multi-layer-perceptron-style neural network building block for image recognition, which is composed of a series of fully-connected (FC) layers.
Ranked #753 on Image Classification on ImageNet
2 code implementations • CVPR 2021 • Xiaohan Ding, Xiangyu Zhang, Jungong Han, Guiguang Ding
We propose a universal building block of Convolutional Neural Network (ConvNet) to improve the performance without any inference-time costs.
22 code implementations • CVPR 2021 • Xiaohan Ding, Xiangyu Zhang, Ningning Ma, Jungong Han, Guiguang Ding, Jian Sun
We present a simple but powerful architecture of convolutional neural network, which has a VGG-like inference-time body composed of nothing but a stack of 3x3 convolution and ReLU, while the training-time model has a multi-branch topology.
Ranked #44 on Semantic Segmentation on Cityscapes val
6 code implementations • ICCV 2021 • Xiaohan Ding, Tianxiang Hao, Jianchao Tan, Ji Liu, Jungong Han, Yuchen Guo, Guiguang Ding
Via training with regular SGD on the former but a novel update rule with penalty gradients on the latter, we realize structured sparsity.
4 code implementations • NeurIPS 2019 • Xiaohan Ding, Guiguang Ding, Xiangxin Zhou, Yuchen Guo, Jungong Han, Ji Liu
Deep Neural Network (DNN) is powerful but computationally expensive and memory intensive, thus impeding its practical usage on resource-constrained front-end devices.
5 code implementations • ICCV 2019 • Xiaohan Ding, Yuchen Guo, Guiguang Ding, Jungong Han
We propose Asymmetric Convolution Block (ACB), an architecture-neutral structure as a CNN building block, which uses 1D asymmetric convolutions to strengthen the square convolution kernels.
1 code implementation • 12 May 2019 • Xiaohan Ding, Guiguang Ding, Yuchen Guo, Jungong Han, Chenggang Yan
It is not easy to design and run Convolutional Neural Networks (CNNs) due to: 1) finding the optimal number of filters (i. e., the width) at each layer is tricky, given an architecture; and 2) the computational intensity of CNNs impedes the deployment on computationally limited devices.
1 code implementation • CVPR 2019 • Xiaohan Ding, Guiguang Ding, Yuchen Guo, Jungong Han
The redundancy is widely recognized in Convolutional Neural Networks (CNNs), which enables to remove unimportant filters from convolutional layers so as to slim the network with acceptable performance drop.