Search Results for author: Xiaohan Ding

Found 28 papers, 22 papers with code

SEED-X: Multimodal Models with Unified Multi-granularity Comprehension and Generation

1 code implementation • 22 Apr 2024 • Yuying Ge, Sijie Zhao, Jinguo Zhu, Yixiao Ge, Kun Yi, Lin Song, Chen Li, Xiaohan Ding, Ying Shan

We hope that our work will inspire future research into what can be achieved by versatile multimodal foundation models in real-world applications.

Image Generation

238

Paper
Code

Leveraging Prompt-Based Large Language Models: Predicting Pandemic Health Decisions and Outcomes Through Social Media Language

no code implementations • 1 Mar 2024 • Xiaohan Ding, Buse Carik, Uma Sushmitha Gunturi, Valerie Reyna, Eugenia H. Rho

We introduce a multi-step reasoning framework using prompt-based LLMs to examine the relationship between social media language patterns and trends in national health outcomes.

Paper
Add Code

InteractiveVideo: User-Centric Controllable Video Generation with Synergistic Multimodal Instructions

1 code implementation • 5 Feb 2024 • Yiyuan Zhang, Yuhao Kang, Zhixin Zhang, Xiaohan Ding, Sanyuan Zhao, Xiangyu Yue

We introduce $\textit{InteractiveVideo}$, a user-centric framework for video generation.

Video Generation

116

Paper
Code

Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities

1 code implementation • 25 Jan 2024 • Yiyuan Zhang, Xiaohan Ding, Kaixiong Gong, Yixiao Ge, Ying Shan, Xiangyu Yue

We propose to improve transformers of a specific modality with irrelevant data from other modalities, e. g., improve an ImageNet model with audio or point cloud datasets.

Paper
Code

VL-GPT: A Generative Pre-trained Transformer for Vision and Language Understanding and Generation

1 code implementation • 14 Dec 2023 • Jinguo Zhu, Xiaohan Ding, Yixiao Ge, Yuying Ge, Sijie Zhao, Hengshuang Zhao, Xiaohua Wang, Ying Shan

In combination with the existing text tokenizer and detokenizer, this framework allows for the encoding of interleaved image-text data into a multimodal sequence, which can subsequently be fed into the transformer model.

Image Captioning In-Context Learning +4

Paper
Code

Online Vectorized HD Map Construction using Geometry

1 code implementation • 6 Dec 2023 • Zhixin Zhang, Yiyuan Zhang, Xiaohan Ding, Fusheng Jin, Xiangyu Yue

In our work, we propose GeMap ($\textbf{Ge}$ometry $\textbf{Map}$), which end-to-end learns Euclidean shapes and relations of map instances beyond basic perception.

150

Paper
Code

UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio, Video, Point Cloud, Time-Series and Image Recognition

2 code implementations • 27 Nov 2023 • Xiaohan Ding, Yiyuan Zhang, Yixiao Ge, Sijie Zhao, Lin Song, Xiangyu Yue, Ying Shan

1) We propose four architectural guidelines for designing large-kernel ConvNets, the core of which is to exploit the essential characteristics of large kernels that distinguish them from small kernels - they can see wide without going deep.

Ranked #1 on Object Detection on COCO 2017 (mAP metric)

Image Classification Object Detection +3

830

Paper
Code

Advancing Vision Transformers with Group-Mix Attention

1 code implementation • 26 Nov 2023 • Chongjian Ge, Xiaohan Ding, Zhan Tong, Li Yuan, Jiangliu Wang, Yibing Song, Ping Luo

The attention map is computed based on the mixtures of tokens and group proxies and used to re-combine the tokens and groups in Value.

Image Classification object-detection +2

107

Paper
Code

RefConv: Re-parameterized Refocusing Convolution for Powerful ConvNets

1 code implementation • 16 Oct 2023 • Zhicheng Cai, Xiaohan Ding, Qiu Shen, Xun Cao

We propose Re-parameterized Refocusing Convolution (RefConv) as a replacement for regular convolutional layers, which is a plug-and-play module to improve the performance without any inference costs.

Image Classification object-detection +2

Paper
Code

Towards Unified and Effective Domain Generalization

1 code implementation • 16 Oct 2023 • Yiyuan Zhang, Kaixiong Gong, Xiaohan Ding, Kaipeng Zhang, Fangrui Lv, Kurt Keutzer, Xiangyu Yue

We propose $\textbf{UniDG}$, a novel and $\textbf{Uni}$fied framework for $\textbf{D}$omain $\textbf{G}$eneralization that is capable of significantly enhancing the out-of-distribution generalization performance of foundation models regardless of their architectures.

Ranked #1 on Domain Generalization on TerraIncognita

Domain Generalization Out-of-Distribution Generalization

Paper
Code

Sticker820K: Empowering Interactive Retrieval with Stickers

no code implementations • 12 Jun 2023 • Sijie Zhao, Yixiao Ge, Zhongang Qi, Lin Song, Xiaohan Ding, Zehua Xie, Ying Shan

Therefore, we propose StickerCLIP as a benchmark model on the Sticker820K dataset.

Image Retrieval Retrieval

Paper
Add Code

Evolving Semantic Prototype Improves Generative Zero-Shot Learning

no code implementations • 12 Jun 2023 • Shiming Chen, Wenjin Hou, Ziming Hong, Xiaohan Ding, Yibing Song, Xinge You, Tongliang Liu, Kun Zhang

After alignment, synthesized sample features from unseen classes are closer to the real sample features and benefit DSP to improve existing generative ZSL methods by 8. 5\%, 8. 0\%, and 9. 7\% on the standard CUB, SUN AWA2 datasets, the significant performance improvement indicates that evolving semantic prototype explores a virgin field in ZSL.

Zero-Shot Learning

Paper
Add Code

What Makes for Good Visual Tokenizers for Large Language Models?

1 code implementation • 20 May 2023 • Guangzhi Wang, Yixiao Ge, Xiaohan Ding, Mohan Kankanhalli, Ying Shan

In our benchmark, which is curated to evaluate MLLMs visual semantic understanding and fine-grained perception capabilities, we discussed different visual tokenizers pre-trained with dominant methods (i. e., DeiT, CLIP, MAE, DINO), and observe that: i) Fully/weakly supervised models capture more semantics than self-supervised models, but the gap is narrowed by scaling up the pre-training dataset.

Image Captioning Object Counting +2

Paper
Code

ToxVis: Enabling Interpretability of Implicit vs. Explicit Toxicity Detection Models with Interactive Visualization

no code implementations • 1 Mar 2023 • Uma Gunturi, Xiaohan Ding, Eugenia H. Rho

By making the classification process explainable, ToxVis provides a valuable tool for understanding the nuances of hateful content and supporting more effective content moderation.

Paper
Add Code

Same Words, Different Meanings: Semantic Polarization in Broadcast Media Language Forecasts Polarization on Social Media Discourse

no code implementations • 20 Jan 2023 • Xiaohan Ding, Mike Horning, Eugenia H. Rho

By analyzing a decade's worth of closed captions (2 million speaker turns) from CNN and Fox News along with topically corresponding discourse from Twitter, we provide a novel framework for measuring semantic polarization between America's two major broadcast networks to demonstrate how semantic polarization between these outlets has evolved (Study 1), peaked (Study 2) and influenced partisan discussions on Twitter (Study 3) across the last decade.

Paper
Add Code

Re-parameterizing Your Optimizers rather than Architectures

1 code implementation • 30 May 2022 • Xiaohan Ding, Honghao Chen, Xiangyu Zhang, Kaiqi Huang, Jungong Han, Guiguang Ding

For the extreme simplicity of model structure, we focus on a VGG-style plain model and showcase that such a simple model trained with a RepOptimizer, which is referred to as RepOpt-VGG, performs on par with or better than the recent well-designed models.

Quantization

244

Paper
Code

Scaling Up Your Kernels to 31x31: Revisiting Large Kernel Design in CNNs

7 code implementations • CVPR 2022 • Xiaohan Ding, Xiangyu Zhang, Yizhuang Zhou, Jungong Han, Guiguang Ding, Jian Sun

We revisit large kernel design in modern convolutional neural networks (CNNs).

Ranked #75 on Image Classification on ImageNet

Image Classification

3,014

Paper
Code

A Likelihood Ratio based Domain Adaptation Method for E2E Models

no code implementations • 10 Jan 2022 • Chhavi Choudhury, Ankur Gandhe, Xiaohan Ding, Ivan Bulyko

In this work, we explore a contextual biasing approach using likelihood-ratio that leverages text data sources to adapt RNN-T model to new domains and entities.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

RepMLPNet: Hierarchical Vision MLP with Re-parameterized Locality

4 code implementations • CVPR 2022 • Xiaohan Ding, Honghao Chen, Xiangyu Zhang, Jungong Han, Guiguang Ding

Our results reveal that 1) Locality Injection is a general methodology for MLP models; 2) RepMLPNet has favorable accuracy-efficiency trade-off compared to the other MLPs; 3) RepMLPNet is the first MLP that seamlessly transfer to Cityscapes semantic segmentation.

Ranked #61 on Semantic Segmentation on Cityscapes val

Image Classification Semantic Segmentation

3,014

Paper
Code

Manipulating Identical Filter Redundancy for Efficient Pruning on Deep and Complicated CNN

2 code implementations • 30 Jul 2021 • Xiaohan Ding, Tianxiang Hao, Jungong Han, Yuchen Guo, Guiguang Ding

The existence of redundancy in Convolutional Neural Networks (CNNs) enables us to remove some filters/channels with acceptable performance drops.

Network Pruning

Paper
Code

RepMLP: Re-parameterizing Convolutions into Fully-connected Layers for Image Recognition

10 code implementations • 5 May 2021 • Xiaohan Ding, Chunlong Xia, Xiangyu Zhang, Xiaojie Chu, Jungong Han, Guiguang Ding

We propose RepMLP, a multi-layer-perceptron-style neural network building block for image recognition, which is composed of a series of fully-connected (FC) layers.

Ranked #753 on Image Classification on ImageNet

Face Recognition Image Classification +1

10,941

Paper
Code

Diverse Branch Block: Building a Convolution as an Inception-like Unit

2 code implementations • CVPR 2021 • Xiaohan Ding, Xiangyu Zhang, Jungong Han, Guiguang Ding

We propose a universal building block of Convolutional Neural Network (ConvNet) to improve the performance without any inference-time costs.

Image Classification object-detection +2

309

Paper
Code

RepVGG: Making VGG-style ConvNets Great Again

22 code implementations • CVPR 2021 • Xiaohan Ding, Xiangyu Zhang, Ningning Ma, Jungong Han, Guiguang Ding, Jian Sun

We present a simple but powerful architecture of convolutional neural network, which has a VGG-like inference-time body composed of nothing but a stack of 3x3 convolution and ReLU, while the training-time model has a multi-branch topology.

Ranked #44 on Semantic Segmentation on Cityscapes val

Image Classification Semantic Segmentation

30,001

Paper
Code

ResRep: Lossless CNN Pruning via Decoupling Remembering and Forgetting

6 code implementations • ICCV 2021 • Xiaohan Ding, Tianxiang Hao, Jianchao Tan, Ji Liu, Jungong Han, Yuchen Guo, Guiguang Ding

Via training with regular SGD on the former but a novel update rule with penalty gradients on the latter, we realize structured sparsity.

309

Paper
Code

Global Sparse Momentum SGD for Pruning Very Deep Neural Networks

4 code implementations • NeurIPS 2019 • Xiaohan Ding, Guiguang Ding, Xiangxin Zhou, Yuchen Guo, Jungong Han, Ji Liu

Deep Neural Network (DNN) is powerful but computationally expensive and memory intensive, thus impeding its practical usage on resource-constrained front-end devices.

Model Compression

819

Paper
Code

ACNet: Strengthening the Kernel Skeletons for Powerful CNN via Asymmetric Convolution Blocks

5 code implementations • ICCV 2019 • Xiaohan Ding, Yuchen Guo, Guiguang Ding, Jungong Han

We propose Asymmetric Convolution Block (ACB), an architecture-neutral structure as a CNN building block, which uses 1D asymmetric convolutions to strengthen the square convolution kernels.

Attribute

5,275

Paper
Code

Approximated Oracle Filter Pruning for Destructive CNN Width Optimization

1 code implementation • 12 May 2019 • Xiaohan Ding, Guiguang Ding, Yuchen Guo, Jungong Han, Chenggang Yan

It is not easy to design and run Convolutional Neural Networks (CNNs) due to: 1) finding the optimal number of filters (i. e., the width) at each layer is tricky, given an architecture; and 2) the computational intensity of CNNs impedes the deployment on computationally limited devices.

Paper
Code

Centripetal SGD for Pruning Very Deep Convolutional Networks with Complicated Structure

1 code implementation • CVPR 2019 • Xiaohan Ding, Guiguang Ding, Yuchen Guo, Jungong Han

The redundancy is widely recognized in Convolutional Neural Networks (CNNs), which enables to remove unimportant filters from convolutional layers so as to slim the network with acceptable performance drop.

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.