Search Results for author: Xiangyu Zhang

Found 200 papers, 115 papers with code

Source Code Foundation Models are Transferable Binary Analysis Knowledge Bases

no code implementations • 30 May 2024 • Zian Su, Xiangzhe Xu, Ziyang Huang, Kaiyuan Zhang, Xiangyu Zhang

Recent advancements in uni-modal code model pre-training, particularly in generative Source Code Foundation Models (SCFMs) and binary understanding models, have laid the groundwork for transfer learning applicable to HOBRE.

Transfer Learning

Paper
Add Code

Is a 3D-Tokenized LLM the Key to Reliable Autonomous Driving?

no code implementations • 28 May 2024 • Yifan Bai, Dongming Wu, Yingfei Liu, Fan Jia, Weixin Mao, Ziheng Zhang, Yucheng Zhao, Jianbing Shen, Xing Wei, Tiancai Wang, Xiangyu Zhang

Despite its simplicity, Atlas demonstrates superior performance in both 3D detection and ego planning tasks on nuScenes dataset, proving that 3D-tokenized LLM is the key to reliable autonomous driving.

3D Object Detection Autonomous Driving +4

Paper
Add Code

Reflected Flow Matching

1 code implementation • 26 May 2024 • Tianyu Xie, Yu Zhu, Longlin Yu, Tong Yang, Ziheng Cheng, Shiyue Zhang, Xiangyu Zhang, Cheng Zhang

We propose reflected flow matching (RFM) to train the velocity model in reflected CNFs by matching the conditional velocity fields in a simulation-free manner, similar to the vanilla FM.

Paper
Code

Focus Anywhere for Fine-grained Multi-page Document Understanding

1 code implementation • 23 May 2024 • Chenglong Liu, Haoran Wei, Jinyue Chen, Lingyu Kong, Zheng Ge, Zining Zhu, Liang Zhao, Jianjian Sun, Chunrui Han, Xiangyu Zhang

Modern LVLMs still struggle to achieve fine-grained document understanding, such as OCR/translation/caption for regions of interest to the user, tasks that require the context of the entire page, or even multiple pages.

document understanding Optical Character Recognition (OCR)

1,604

Paper
Code

Mamba in Speech: Towards an Alternative to Self-Attention

no code implementations • 21 May 2024 • Xiangyu Zhang, Qiquan Zhang, Hexin Liu, Tianyi Xiao, Xinyuan Qian, Beena Ahmed, Eliathamby Ambikairajah, Haizhou Li, Julien Epps

Moreover, experiments demonstrate the effectiveness of BiMamba as an alternative to the self-attention module in Transformer and its derivates, particularly for the semantic-aware task.

Speech Enhancement speech-recognition +1

Paper
Add Code

On the relevance of pre-neural approaches in natural language processing pedagogy

no code implementations • 16 May 2024 • Aditya Joshi, Jake Renzella, Pushpak Bhattacharyya, Saurav Jha, Xiangyu Zhang

While neural approaches using deep learning are the state-of-the-art for natural language processing (NLP) today, pre-neural algorithms and approaches still find a place in NLP textbooks and courses of recent years.

Paper
Add Code

Music Emotion Prediction Using Recurrent Neural Networks

1 code implementation • 10 May 2024 • Xinyu Chang, Xiangyu Zhang, Haoruo Zhang, Yulu Ran

This study explores the application of recurrent neural networks to recognize emotions conveyed in music, aiming to enhance music recommendation systems and support therapeutic interventions by tailoring music to fit listeners' emotional states.

Music Recommendation Recommendation Systems

Paper
Code

Threat Behavior Textual Search by Attention Graph Isomorphism

1 code implementation • 16 Apr 2024 • Chanwoo Bae, Guanhong Tao, Zhuo Zhang, Xiangyu Zhang

As such, analysts often resort to text search techniques to identify existing malware reports based on the symptoms they observe, exploiting the fact that malware samples share a lot of similarity, especially those from the same origin.

Attribute Malware Analysis +2

Paper
Code

Self-Supervised Visual Preference Alignment

1 code implementation • 16 Apr 2024 • Ke Zhu, Liang Zhao, Zheng Ge, Xiangyu Zhang

We generate chosen and rejected responses with regard to the original and augmented image pairs, and conduct preference alignment with direct preference optimization.

Ranked #34 on Visual Question Answering on MM-Vet

8k Visual Question Answering

Paper
Code

OneChart: Purify the Chart Structural Extraction via One Auxiliary Token

1 code implementation • 15 Apr 2024 • Jinyue Chen, Lingyu Kong, Haoran Wei, Chenglong Liu, Zheng Ge, Liang Zhao, Jianjian Sun, Chunrui Han, Xiangyu Zhang

To address this, we propose OneChart: a reliable agent specifically devised for the structural extraction of chart information.

Decoder

Paper
Code

BadPart: Unified Black-box Adversarial Patch Attacks against Pixel-wise Regression Tasks

1 code implementation • 1 Apr 2024 • Zhiyuan Cheng, Zhaoyi Liu, Tengda Guo, Shiwei Feng, Dongfang Liu, Mingjie Tang, Xiangyu Zhang

Our attack prototype, named BadPart, is evaluated on both MDE and OFE tasks, utilizing a total of 7 models.

Adversarial Robustness Autonomous Driving +3

Paper
Code

SubjectDrive: Scaling Generative Data in Autonomous Driving via Subject Control

no code implementations • 28 Mar 2024 • Binyuan Huang, Yuqing Wen, Yucheng Zhao, Yaosi Hu, Yingfei Liu, Fan Jia, Weixin Mao, Tiancai Wang, Chi Zhang, Chang Wen Chen, Zhenzhong Chen, Xiangyu Zhang

Autonomous driving progress relies on large-scale annotated datasets.

Autonomous Driving

Paper
Add Code

LOTUS: Evasive and Resilient Backdoor Attacks through Sub-Partitioning

1 code implementation • 25 Mar 2024 • Siyuan Cheng, Guanhong Tao, Yingqi Liu, Guangyu Shen, Shengwei An, Shiwei Feng, Xiangzhe Xu, Kaiyuan Zhang, Shiqing Ma, Xiangyu Zhang

Backdoor attack poses a significant security threat to Deep Learning applications.

Backdoor Attack

Paper
Code

Aligning Speech to Languages to Enhance Code-switching Speech Recognition

no code implementations • 9 Mar 2024 • Hexin Liu, Xiangyu Zhang, Leibny Paola Garcia, Andy W. H. Khong, Eng Siong Chng, Shinji Watanabe

Performance evaluation using large language models reveals the advantage of the linguistic hint by achieving 14. 1% and 5. 5% relative improvement on test sets of the ASRU and SEAME datasets, respectively.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Paper
Add Code

CodeArt: Better Code Models by Attention Regularization When Symbols Are Lacking

1 code implementation • 19 Feb 2024 • Zian Su, Xiangzhe Xu, Ziyang Huang, Zhuo Zhang, Yapeng Ye, Jianjun Huang, Xiangyu Zhang

Our pre-trained model can improve the SOTAs in these tasks from 53% to 64%, 49% to 60%, and 74% to 94%, respectively.

Language Modelling Masked Language Modeling

Paper
Code

When LLMs Meets Acoustic Landmarks: An Efficient Approach to Integrate Speech into Large Language Models for Depression Detection

no code implementations • 17 Feb 2024 • Xiangyu Zhang, Hexin Liu, Kaishuai Xu, Qiquan Zhang, Daijiao Liu, Beena Ahmed, Julien Epps

In addition, this approach is not only valuable for the detection of depression but also represents a new perspective in enhancing the ability of LLMs to comprehend and process speech signals.

Depression Detection

Paper
Add Code

Speaking in Wavelet Domain: A Simple and Efficient Approach to Speed up Speech Diffusion Model

no code implementations • 16 Feb 2024 • Xiangyu Zhang, Daijiao Liu, Hexin Liu, Qiquan Zhang, Hanyu Meng, Leibny Paola Garcia, Eng Siong Chng, Lina Yao

Recently, Denoising Diffusion Probabilistic Models (DDPMs) have attained leading performances across a diverse range of generative tasks.

Denoising Speech Enhancement +1

Paper
Add Code

When Dataflow Analysis Meets Large Language Models

no code implementations • 16 Feb 2024 • Chengpeng Wang, Wuqi Zhang, Zian Su, Xiangzhe Xu, Xiaoheng Xie, Xiangyu Zhang

Dataflow analysis is a powerful code analysis technique that reasons dependencies between program values, offering support for code optimization, program comprehension, and bug detection.

Hallucination

Paper
Add Code

Rapid Optimization for Jailbreaking LLMs via Subconscious Exploitation and Echopraxia

1 code implementation • 8 Feb 2024 • Guangyu Shen, Siyuan Cheng, Kaiyuan Zhang, Guanhong Tao, Shengwei An, Lu Yan, Zhuo Zhang, Shiqing Ma, Xiangyu Zhang

Large Language Models (LLMs) have become prevalent across diverse sectors, transforming human life with their extraordinary reasoning and comprehension abilities.

Paper
Code

MULTIVERSE: Exposing Large Language Model Alignment Problems in Diverse Worlds

no code implementations • 25 Jan 2024 • Xiaolong Jin, Zhuo Zhang, Xiangyu Zhang

Given the low cost of our method, we are able to conduct a large scale study regarding LLM alignment issues in different worlds.

Language Modelling Large Language Model

Paper
Add Code

Small Language Model Meets with Reinforced Vision Vocabulary

no code implementations • 23 Jan 2024 • Haoran Wei, Lingyu Kong, Jinyue Chen, Liang Zhao, Zheng Ge, En Yu, Jianjian Sun, Chunrui Han, Xiangyu Zhang

In Vary-toy, we introduce an improved vision vocabulary, allowing the model to not only possess all features of Vary but also gather more generality.

Ranked #81 on Visual Question Answering on MM-Vet

Language Modelling Large Language Model +3

Paper
Add Code

Stream Query Denoising for Vectorized HD Map Construction

no code implementations • 17 Jan 2024 • Shuo Wang, Fan Jia, Yingfei Liu, Yucheng Zhao, Zehui Chen, Tiancai Wang, Chi Zhang, Xiangyu Zhang, Feng Zhao

This paper introduces the Stream Query Denoising (SQD) strategy as a novel approach for temporal modeling in high-definition map (HD-map) construction.

Autonomous Driving Denoising

Paper
Add Code

Slot-guided Volumetric Object Radiance Fields

no code implementations • NeurIPS 2023 • Di Qi, Tong Yang, Xiangyu Zhang

We hope our approach can provide preliminary understanding of the physical world and help ease future research in 3D object-centric representation learning.

Object Representation Learning

Paper
Add Code

Bootstrap Masked Visual Modeling via Hard Patches Mining

1 code implementation • 21 Dec 2023 • Haochen Wang, Junsong Fan, Yuxi Wang, Kaiyou Song, Tiancai Wang, Xiangyu Zhang, Zhaoxiang Zhang

To empower the model as a teacher, we propose Hard Patches Mining (HPM), predicting patch-wise losses and subsequently determining where to mask.

Paper
Code

Vary: Scaling up the Vision Vocabulary for Large Vision-Language Models

1 code implementation • 11 Dec 2023 • Haoran Wei, Lingyu Kong, Jinyue Chen, Liang Zhao, Zheng Ge, Jinrong Yang, Jianjian Sun, Chunrui Han, Xiangyu Zhang

Accordingly, we propose Vary, an efficient and effective method to scale up the vision vocabulary of LVLMs.

Ranked #56 on Visual Question Answering on MM-Vet

Decoder Optical Character Recognition (OCR) +1

1,604

Paper
Code

Compound Text-Guided Prompt Tuning via Image-Adaptive Cues

1 code implementation • 11 Dec 2023 • Hao Tan, Jun Li, Yizhuang Zhou, Jun Wan, Zhen Lei, Xiangyu Zhang

We introduce text supervision to the optimization of prompts, which enables two benefits: 1) releasing the model reliance on the pre-defined category names during inference, thereby enabling more flexible prompt generation; 2) reducing the number of inputs to the text encoder, which decreases GPU memory consumption significantly.

Domain Generalization

Paper
Code

Make Them Spill the Beans! Coercive Knowledge Extraction from (Production) LLMs

no code implementations • 8 Dec 2023 • Zhuo Zhang, Guangyu Shen, Guanhong Tao, Siyuan Cheng, Xiangyu Zhang

Instead, it exploits the fact that even when an LLM rejects a toxic request, a harmful response often hides deep in the output logits.

Paper
Add Code

Merlin:Empowering Multimodal LLMs with Foresight Minds

no code implementations • 30 Nov 2023 • En Yu, Liang Zhao, Yana Wei, Jinrong Yang, Dongming Wu, Lingyu Kong, Haoran Wei, Tiancai Wang, Zheng Ge, Xiangyu Zhang, Wenbing Tao

Then, FIT requires MLLMs to first predict trajectories of related objects and then reason about potential future events based on them.

Ranked #66 on Visual Question Answering on MM-Vet

Visual Question Answering

Paper
Add Code

Panacea: Panoramic and Controllable Video Generation for Autonomous Driving

no code implementations • 28 Nov 2023 • Yuqing Wen, Yucheng Zhao, Yingfei Liu, Fan Jia, Yanhui Wang, Chong Luo, Chi Zhang, Tiancai Wang, Xiaoyan Sun, Xiangyu Zhang

This work notably propels the field of autonomous driving by effectively augmenting the training dataset used for advanced BEV perception techniques.

Autonomous Driving Video Generation

Paper
Add Code

Elijah: Eliminating Backdoors Injected in Diffusion Models via Distribution Shift

1 code implementation • 27 Nov 2023 • Shengwei An, Sheng-Yen Chou, Kaiyuan Zhang, QiuLing Xu, Guanhong Tao, Guangyu Shen, Siyuan Cheng, Shiqing Ma, Pin-Yu Chen, Tsung-Yi Ho, Xiangyu Zhang

Diffusion models (DM) have become state-of-the-art generative models because of their capability to generate high-quality images from noises without adversarial training.

Paper
Code

A Quantitative Approach to Understand Self-Supervised Models as Cross-lingual Feature Extractors

1 code implementation • 27 Nov 2023 • Shuyue Stella Li, Beining Xu, Xiangyu Zhang, Hexin Liu, WenHan Chao, Leibny Paola Garcia

There is a positive correlation between PSR scores and ASR performance, suggesting that phonetic information extracted by monolingual SSL models can be used for downstream tasks in cross-lingual settings.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Code

Nova$^+$: Generative Language Models for Binaries

no code implementations • 22 Nov 2023 • Nan Jiang, Chengxiao Wang, Kevin Liu, Xiangzhe Xu, Lin Tan, Xiangyu Zhang

We build Nova$^+$ to further boost Nova using two new pre-training tasks, i. e., optimization generation and optimization level prediction, which are designed to learn binary optimization and align equivalent binaries.

Code Translation Compiler Optimization +2

Paper
Add Code

ADriver-I: A General World Model for Autonomous Driving

no code implementations • 22 Nov 2023 • Fan Jia, Weixin Mao, Yingfei Liu, Yucheng Zhao, Yuqing Wen, Chi Zhang, Xiangyu Zhang, Tiancai Wang

Based on the vision-action pairs, we construct a general world model based on MLLM and diffusion model for autonomous driving, termed ADriver-I.

Autonomous Driving

Paper
Add Code

Hierarchical Semi-Implicit Variational Inference with Application to Diffusion Model Acceleration

1 code implementation • NeurIPS 2023 • Longlin Yu, Tianyu Xie, Yu Zhu, Tong Yang, Xiangyu Zhang, Cheng Zhang

Semi-implicit variational inference (SIVI) has been introduced to expand the analytical variational families by defining expressive semi-implicit distributions in a hierarchical manner.

Bayesian Inference Variational Inference

Paper
Code

LAMP: Learn A Motion Pattern for Few-Shot-Based Video Generation

1 code implementation • 16 Oct 2023 • Ruiqi Wu, Liangyu Chen, Tong Yang, Chunle Guo, Chongyi Li, Xiangyu Zhang

Specifically, we design a first-frame-conditioned pipeline that uses an off-the-shelf text-to-image model for content generation so that our tuned video diffusion model mainly focuses on motion learning.

Image Animation Text-to-Image Generation +2

227

Paper
Code

Secondary frequency control of islanded microgrid considering wind and solar stochastics

no code implementations • 8 Oct 2023 • Cheng Zhong, Zhifu Jiang, Xiangyu Zhang, Jikai Chen, Yang Li

Finally, a microgrid simulation model including multiple PV and wind DGs is built and performed in various scenarios compared to the traditional secondary frequency control method.

Model Predictive Control

Paper
Add Code

Enhancing Code-switching Speech Recognition with Interactive Language Biases

no code implementations • 29 Sep 2023 • Hexin Liu, Leibny Paola Garcia, Xiangyu Zhang, Andy W. H. Khong, Sanjeev Khudanpur

Languages usually switch within a multilingual speech signal, especially in a bilingual society.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Cold & Warm Net: Addressing Cold-Start Users in Recommender Systems

no code implementations • 27 Sep 2023 • Xiangyu Zhang, Zongqiang Kuang, Zehao Zhang, Fan Huang, Xianfeng Tan

Finally, we evaluate our Cold & Warm Net on public datasets in comparison to models commonly applied in the matching stage and it outperforms other models on all user types.

Knowledge Distillation Meta-Learning +1

Paper
Add Code

Unidirectional brain-computer interface: Artificial neural network encoding natural images to fMRI response in the visual cortex

1 code implementation • 26 Sep 2023 • Ruixing Liang, Xiangyu Zhang, Qiong Li, Lai Wei, Hexin Liu, Avisha Kumar, Kelley M. Kempski Leadingham, Joshua Punnoose, Leibny Paola Garcia, Amir Manbachi

While significant advancements in artificial intelligence (AI) have catalyzed progress across various domains, its full potential in understanding visual perception remains underexplored.

Brain Computer Interface

Paper
Code

DreamLLM: Synergistic Multimodal Comprehension and Creation

1 code implementation • 20 Sep 2023 • Runpei Dong, Chunrui Han, Yuang Peng, Zekun Qi, Zheng Ge, Jinrong Yang, Liang Zhao, Jianjian Sun, HongYu Zhou, Haoran Wei, Xiangwen Kong, Xiangyu Zhang, Kaisheng Ma, Li Yi

This paper presents DreamLLM, a learning framework that first achieves versatile Multimodal Large Language Models (MLLMs) empowered with frequently overlooked synergy between multimodal comprehension and creation.

Ranked #2 on Visual Question Answering on MMBench (GPT-3.5 score metric)

multimodal generation Visual Question Answering +2

328

Paper
Code

Language Prompt for Autonomous Driving

2 code implementations • 8 Sep 2023 • Dongming Wu, Wencheng Han, Tiancai Wang, Yingfei Liu, Xiangyu Zhang, Jianbing Shen

A new trend in the computer vision community is to capture objects of interest following flexible human command represented by a natural language prompt.

Autonomous Driving Object

659

Paper
Code

RevColV2: Exploring Disentangled Representations in Masked Image Modeling

1 code implementation • NeurIPS 2023 • Qi Han, Yuxuan Cai, Xiangyu Zhang

Such design enables our architecture with the nice property: maintaining disentangled low-level and semantic information at the end of the network in MIM pre-training.

Decoder Image Classification +4

245

Paper
Code

Far3D: Expanding the Horizon for Surround-view 3D Object Detection

1 code implementation • 18 Aug 2023 • Xiaohui Jiang, Shuailin Li, Yingfei Liu, Shihao Wang, Fan Jia, Tiancai Wang, Lijin Han, Xiangyu Zhang

Recently 3D object detection from surround-view images has made notable advancements with its low deployment cost.

Ranked #1 on 3D Object Detection on nuScenes Camera Only

3D Object Detection Denoising +1

114

Paper
Code

SCSC: Spatial Cross-scale Convolution Module to Strengthen both CNNs and Transformers

no code implementations • 14 Aug 2023 • Xijun Wang, Xiaojie Chu, Chunrui Han, Xiangyu Zhang

This paper presents a module, Spatial Cross-scale Convolution (SCSC), which is verified to be effective in improving both CNNs and Transformers.

Face Recognition

Paper
Add Code

POSIT: Promotion of Semantic Item Tail via Adversarial Learning

no code implementations • 7 Aug 2023 • QiuLing Xu, Pannaga Shivaswamy, Xiangyu Zhang

We subsequently use that metric in an adversarial learning framework to systematically promote disadvantaged items.

Paper
Add Code

OnlineRefer: A Simple Online Baseline for Referring Video Object Segmentation

1 code implementation • ICCV 2023 • Dongming Wu, Tiancai Wang, Yuang Zhang, Xiangyu Zhang, Jianbing Shen

Referring video object segmentation (RVOS) aims at segmenting an object in a video following human instruction.

Ranked #15 on Referring Expression Segmentation on Refer-YouTube-VOS (2021 public validation)

Referring Expression Segmentation Referring Video Object Segmentation +2

Paper
Code

GroupLane: End-to-End 3D Lane Detection with Channel-wise Grouping

no code implementations • 18 Jul 2023 • Zhuoling Li, Chunrui Han, Zheng Ge, Jinrong Yang, En Yu, Haoqian Wang, Hengshuang Zhao, Xiangyu Zhang

Besides, GroupLane with ResNet18 still surpasses PersFormer by 4. 9% F1 score, while the inference speed is nearly 7x faster and the FLOPs is only 13. 3% of it.

3D Lane Detection

Paper
Add Code

ChatSpot: Bootstrapping Multimodal LLMs via Precise Referring Instruction Tuning

no code implementations • 18 Jul 2023 • Liang Zhao, En Yu, Zheng Ge, Jinrong Yang, Haoran Wei, HongYu Zhou, Jianjian Sun, Yuang Peng, Runpei Dong, Chunrui Han, Xiangyu Zhang

Based on precise referring instruction, we propose ChatSpot, a unified end-to-end multimodal large language model that supports diverse forms of interactivity including mouse clicks, drag-and-drop, and drawing boxes, which provides a more flexible and seamless interactive experience.

Instruction Following Language Modelling +1

Paper
Add Code

Non-Stationary Policy Learning for Multi-Timescale Multi-Agent Reinforcement Learning

no code implementations • 17 Jul 2023 • Patrick Emami, Xiangyu Zhang, David Biagioni, Ahmed S. Zamzam

In detail, we theoretically demonstrate that the effects of non-stationarity introduced by multiple timescales can be learned by a periodic multi-agent policy.

energy management Inductive Bias +3

Paper
Add Code

MERLIon CCS Challenge: A English-Mandarin code-switching child-directed speech corpus for language identification and diarization

1 code implementation • 30 May 2023 • Victoria Y. H. Chua, Hexin Liu, Leibny Paola Garcia Perera, Fei Ting Woon, Jinyi Wong, Xiangyu Zhang, Sanjeev Khudanpur, Andy W. H. Khong, Justin Dauwels, Suzy J. Styles

To enhance the reliability and robustness of language identification (LID) and language diarization (LD) systems for heterogeneous populations and scenarios, there is a need for speech processing models to be trained on datasets that feature diverse language registers and speech patterns.

Language Identification

Paper
Code

Backdooring Neural Code Search

1 code implementation • 27 May 2023 • Weisong Sun, Yuchen Chen, Guanhong Tao, Chunrong Fang, Xiangyu Zhang, Quanjun Zhang, Bin Luo

Neural code search models are hence behind many such engines.

Autonomous Driving Code Search +1

Paper
Code

MOTRv3: Release-Fetch Supervision for End-to-End Multi-Object Tracking

no code implementations • 23 May 2023 • En Yu, Tiancai Wang, Zhuoling Li, Yuang Zhang, Xiangyu Zhang, Wenbing Tao

Although end-to-end multi-object trackers like MOTR enjoy the merits of simplicity, they suffer from the conflict between detection and association seriously, resulting in unsatisfactory convergence dynamics.

Denoising Multi-Object Tracking +1

Paper
Add Code

Fusion is Not Enough: Single Modal Attacks on Fusion Models for 3D Object Detection

no code implementations • 28 Apr 2023 • Zhiyuan Cheng, Hongjun Choi, James Liang, Shiwei Feng, Guanhong Tao, Dongfang Liu, Michael Zuzak, Xiangyu Zhang

We argue that the weakest link of fusion models depends on their most vulnerable modality, and propose an attack framework that targets advanced camera-LiDAR fusion-based 3D object detection models through camera-only adversarial attacks.

3D Object Detection Autonomous Driving +2

Paper
Add Code

Self-supervised Learning by View Synthesis

no code implementations • 22 Apr 2023 • Shaoteng Liu, Xiangyu Zhang, Tao Hu, Jiaya Jia

In each iteration, the input to VSA is one view (or multiple views) of a 3D object and the output is a synthesized image in another target pose.

3D Classification Decoder +1

Paper
Add Code

Align-DETR: Improving DETR with Simple IoU-aware BCE loss

1 code implementation • 15 Apr 2023 • Zhi Cai, Songtao Liu, Guodong Wang, Zheng Ge, Xiangyu Zhang, Di Huang

We propose a metric, recall of best-regressed samples, to quantitively evaluate the misalignment problem.

object-detection Object Detection

Paper
Code

Detecting Backdoors in Pre-trained Encoders

1 code implementation • CVPR 2023 • Shiwei Feng, Guanhong Tao, Siyuan Cheng, Guangyu Shen, Xiangzhe Xu, Yingqi Liu, Kaiyuan Zhang, Shiqing Ma, Xiangyu Zhang

We show the effectiveness of our method on image encoders pre-trained on ImageNet and OpenAI's CLIP 400 million image-text pairs.

Self-Supervised Learning

Paper
Code

Exploring Object-Centric Temporal Modeling for Efficient Multi-View 3D Object Detection

1 code implementation • ICCV 2023 • Shihao Wang, Yingfei Liu, Tiancai Wang, Ying Li, Xiangyu Zhang

On the standard nuScenes benchmark, it is the first online multi-view method that achieves comparable performance (67. 6% NDS & 65. 3% AMOTA) with lidar-based methods.

Ranked #1 on 3D Multi-Object Tracking on nuScenes Camera Only

3D Multi-Object Tracking 3D Object Detection +2

495

Paper
Code

VoxelNeXt: Fully Sparse VoxelNet for 3D Object Detection and Tracking

2 code implementations • CVPR 2023 • Yukang Chen, Jianhui Liu, Xiangyu Zhang, Xiaojuan Qi, Jiaya Jia

Our core insight is to predict objects directly based on sparse voxel features, without relying on hand-crafted proxies.

Ranked #1 on 3D Object Detection on Argoverse2

3D Object Detection Object +1

656

Paper
Code

Exploring Recurrent Long-term Temporal Fusion for Multi-view 3D Perception

no code implementations • 10 Mar 2023 • Chunrui Han, Jinrong Yang, Jianjian Sun, Zheng Ge, Runpei Dong, HongYu Zhou, Weixin Mao, Yuang Peng, Xiangyu Zhang

In this paper, we explore an embarrassingly simple long-term recurrent fusion strategy built upon the LSS-based methods and find it already able to enjoy the merits from both sides, i. e., rich long-term information and efficient fusion pipeline.

motion prediction object-detection +1

Paper
Add Code

Referring Multi-Object Tracking

1 code implementation • CVPR 2023 • Dongming Wu, Wencheng Han, Tiancai Wang, Xingping Dong, Xiangyu Zhang, Jianbing Shen

In this paper, we propose a new and general referring understanding task, termed referring multi-object tracking (RMOT).

Multi-Object Tracking Object

Paper
Code

Contrast with Reconstruct: Contrastive 3D Representation Learning Guided by Generative Pretraining

3 code implementations • 5 Feb 2023 • Zekun Qi, Runpei Dong, Guofan Fan, Zheng Ge, Xiangyu Zhang, Kaisheng Ma, Li Yi

This motivates us to learn 3D representations by sharing the merits of both paradigms, which is non-trivial due to the pattern difference between the two paradigms.

Ranked #1 on Zero-Shot Transfer 3D Point Cloud Classification on ModelNet10 (using extra training data)

3D Point Cloud Linear Classification Decoder +3

113

Paper
Code

KNOD: Domain Knowledge Distilled Tree Decoder for Automated Program Repair

1 code implementation • 3 Feb 2023 • Nan Jiang, Thibaud Lutellier, Yiling Lou, Lin Tan, Dan Goldwasser, Xiangyu Zhang

KNOD has two major novelties, including (1) a novel three-stage tree decoder, which directly generates Abstract Syntax Trees of patched code according to the inherent tree structure, and (2) a novel domain-rule distillation, which leverages syntactic and semantic rules and teacher-student distributions to explicitly inject the domain knowledge into the decoding procedure during both the training and inference phases.

Decoder Program Repair

Paper
Code

Adversarial Training of Self-supervised Monocular Depth Estimation against Physical-World Attacks

1 code implementation • 31 Jan 2023 • Zhiyuan Cheng, James Liang, Guanhong Tao, Dongfang Liu, Xiangyu Zhang

We improve adversarial robustness against physical-world attacks using L0-norm-bounded perturbation in training.

Adversarial Robustness Autonomous Driving +2

Paper
Code

BEAGLE: Forensics of Deep Learning Backdoor Attack for Better Defense

1 code implementation • 16 Jan 2023 • Siyuan Cheng, Guanhong Tao, Yingqi Liu, Shengwei An, Xiangzhe Xu, Shiwei Feng, Guangyu Shen, Kaiyuan Zhang, QiuLing Xu, Shiqing Ma, Xiangyu Zhang

Attack forensics, a critical counter-measure for traditional cyber attacks, is hence of importance for defending model backdoor attacks.

Backdoor Attack

Paper
Code

Understanding Imbalanced Semantic Segmentation Through Neural Collapse

2 code implementations • CVPR 2023 • Zhisheng Zhong, Jiequan Cui, Yibo Yang, Xiaoyang Wu, Xiaojuan Qi, Xiangyu Zhang, Jiaya Jia

Based on our empirical and theoretical analysis, we point out that semantic segmentation naturally brings contextual correlation and imbalanced distribution among classes, which breaks the equiangular and maximally separated structure of neural collapse for both feature centers and classifiers.

3D Semantic Segmentation Segmentation

1,223

Paper
Code

Cross Modal Transformer: Towards Fast and Robust 3D Object Detection

2 code implementations • ICCV 2023 • Junjie Yan, Yingfei Liu, Jianjian Sun, Fan Jia, Shuailin Li, Tiancai Wang, Xiangyu Zhang

In this paper, we propose a robust 3D detector, named Cross Modal Transformer (CMT), for end-to-end 3D multi-modal detection.

object-detection Object Tracking +1

803

Paper
Code

MEDIC: Remove Model Backdoors via Importance Driven Cloning

no code implementations • CVPR 2023 • QiuLing Xu, Guanhong Tao, Jean Honorio, Yingqi Liu, Shengwei An, Guangyu Shen, Siyuan Cheng, Xiangyu Zhang

It trains the clone model from scratch on a very small subset of samples and aims to minimize a cloning loss that denotes the differences between the activations of important neurons across the two models.

Knowledge Distillation

Paper
Add Code

Reversible Column Networks

1 code implementation • 22 Dec 2022 • Yuxuan Cai, Yizhuang Zhou, Qi Han, Jianjian Sun, Xiangwen Kong, Jun Li, Xiangyu Zhang

Such architectural scheme attributes RevCol very different behavior from conventional networks: during forward propagation, features in RevCol are learned to be gradually disentangled when passing through each column, whose total information is maintained rather than compressed or discarded as other network does.

Ranked #8 on Semantic Segmentation on ADE20K (using extra training data)

Image Classification object-detection +3

245

Paper
Code

Backdoor Vulnerabilities in Normally Trained Deep Learning Models

no code implementations • 29 Nov 2022 • Guanhong Tao, Zhenting Wang, Siyuan Cheng, Shiqing Ma, Shengwei An, Yingqi Liu, Guangyu Shen, Zhuo Zhang, Yunshu Mao, Xiangyu Zhang

We leverage 20 different types of injected backdoor attacks in the literature as the guidance and study their correspondences in normally trained models, which we call natural backdoor vulnerabilities.

Data Poisoning

Paper
Add Code

Near-Field Channel Estimation for Extremely Large-Scale Array Communications: A model-based deep learning approach

no code implementations • 28 Nov 2022 • Xiangyu Zhang, Zening Wang, Haiyang Zhang, Luxi Yang

In particular, we first formulate the XL-MIMO near-field channel estimation task as a compressed sensing problem using the spatial gridding-based sparsifying dictionary, and then solve the resulting problem by applying the Learning Iterative Shrinkage and Thresholding Algorithm (LISTA).

Dictionary Learning

Paper
Add Code

Twin-S: A Digital Twin for Skull-base Surgery

1 code implementation • 21 Nov 2022 • Hongchao Shu, Ruixing Liang, Zhaoshuo Li, Anna Goodridge, Xiangyu Zhang, Hao Ding, Nimesh Nagururu, Manish Sahu, Francis X. Creighton, Russell H. Taylor, Adnan Munawar, Mathias Unberath

Twin-S tracks and updates the virtual model in real-time given measurements from modern tracking technologies.

Anatomy Mixed Reality

Paper
Code

MatrixVT: Efficient Multi-Camera to BEV Transformation for 3D Perception

2 code implementations • ICCV 2023 • HongYu Zhou, Zheng Ge, Zeming Li, Xiangyu Zhang

This paper proposes an efficient multi-camera to Bird's-Eye-View (BEV) view transformation method for 3D perception, dubbed MatrixVT.

Ranked #2 on Bird's-Eye View Semantic Segmentation on nuScenes (IoU lane - 224x480 - 100x100 at 0.5 metric)

Autonomous Driving Bird's-Eye View Semantic Segmentation +2

678

Paper
Code

MOTRv2: Bootstrapping End-to-End Multi-Object Tracking by Pretrained Object Detectors

4 code implementations • CVPR 2023 • Yuang Zhang, Tiancai Wang, Xiangyu Zhang

In this paper, we propose MOTRv2, a simple yet effective pipeline to bootstrap end-to-end multi-object tracking with a pretrained object detector.

Ranked #2 on Multi-Object Tracking on DanceTrack (using extra training data)

Multi-Object Tracking Multiple Object Tracking with Transformer +2

365

Paper
Code

Towards 3D Object Detection with 2D Supervision

no code implementations • 15 Nov 2022 • Jinrong Yang, Tiancai Wang, Zheng Ge, Weixin Mao, Xiaoping Li, Xiangyu Zhang

We propose a temporal 2D transformation to bridge the 3D predictions with temporal 2D labels.

3D Object Detection Object +1

Paper
Add Code

The 1st-place Solution for ECCV 2022 Multiple People Tracking in Group Dance Challenge

3 code implementations • 27 Oct 2022 • Yuang Zhang, Tiancai Wang, Weiyao Lin, Xiangyu Zhang

We present our 1st place solution to the Group Dance Multiple People Tracking Challenge.

Multi-Object Tracking Multiple Object Tracking with Transformer +1

336

Paper
Code

FLIP: A Provable Defense Framework for Backdoor Mitigation in Federated Learning

1 code implementation • 23 Oct 2022 • Kaiyuan Zhang, Guanhong Tao, QiuLing Xu, Siyuan Cheng, Shengwei An, Yingqi Liu, Shiwei Feng, Guangyu Shen, Pin-Yu Chen, Shiqing Ma, Xiangyu Zhang

In this work, we theoretically analyze the connection among cross-entropy loss, attack success rate, and clean accuracy in this setting.

Backdoor Attack backdoor defense +1

Paper
Code

A New Approach to Extract Fetal Electrocardiogram Using Affine Combination of Adaptive Filters

no code implementations • 21 Oct 2022 • Yu Xuan, Xiangyu Zhang, Shuyue Stella Li, Zihan Shen, Xin Xie, Leibny Paola Garcia, Roberto Togneri

Compared with the state-of-the-art MSF-ANC method, CRLS shows improved performance.

Paper
Add Code

From Model-Based to Model-Free: Learning Building Control for Demand Response

1 code implementation • 18 Oct 2022 • David Biagioni, Xiangyu Zhang, Christiane Adcock, Michael Sinner, Peter Graf, Jennifer King

We demonstrate, in this context, that hybrid methods offer many benefits over both purely model-free and model-based methods as long as certain requirements are met.

Paper
Code

PQLM -- Multilingual Decentralized Portable Quantum Language Model for Privacy Protection

no code implementations • 6 Oct 2022 • Shuyue Stella Li, Xiangyu Zhang, Shu Zhou, Hongchao Shu, Ruixing Liang, Hexin Liu, Leibny Paola Garcia

In this work, we propose a highly Portable Quantum Language Model (PQLM) that can easily transmit information to downstream tasks on classical machines.

Language Modelling Sentence Embedding +3

Paper
Add Code

End-to-End Lyrics Recognition with Self-supervised Learning

no code implementations • 26 Sep 2022 • Xiangyu Zhang, Shuyue Stella Li, Zhanhong He, Roberto Togneri, Leibny Paola Garcia

Lyrics recognition is an important task in music processing.

Contrastive Learning Domain Generalization +2

Paper
Add Code

Differentiable Architecture Search with Random Features

no code implementations • CVPR 2023 • Xuanyang Zhang, Yonggang Li, Xiangyu Zhang, Yongtao Wang, Jian Sun

Differentiable architecture search (DARTS) has significantly promoted the development of NAS techniques because of its high search efficiency and effectiveness but suffers from performance collapse.

Ranked #13 on Neural Architecture Search on NAS-Bench-201, CIFAR-10

Neural Architecture Search

Paper
Add Code

Understanding Masked Image Modeling via Learning Occlusion Invariant Feature

no code implementations • CVPR 2023 • Xiangwen Kong, Xiangyu Zhang

Recently, Masked Image Modeling (MIM) achieves great success in self-supervised visual recognition.

Contrastive Learning Open-Ended Question Answering

Paper
Add Code

Revisiting the Critical Factors of Augmentation-Invariant Representation Learning

1 code implementation • 30 Jul 2022 • Junqiang Huang, Xiangwen Kong, Xiangyu Zhang

We focus on better understanding the critical factors of augmentation-invariant representation learning.

Linear evaluation Representation Learning

Paper
Code

Physical Attack on Monocular Depth Estimation with Optimal Adversarial Patches

1 code implementation • 11 Jul 2022 • Zhiyuan Cheng, James Liang, Hongjun Choi, Guanhong Tao, Zhiwen Cao, Dongfang Liu, Xiangyu Zhang

Experimental results show that our method can generate stealthy, effective, and robust adversarial patches for different target objects and models and achieves more than 6 meters mean depth estimation error and 93% attack success rate (ASR) in object detection with a patch of 1/9 of the vehicle's rear area.

3D Object Detection Autonomous Driving +3

Paper
Code

Robust Watermarking for Video Forgery Detection with Improved Imperceptibility and Robustness

no code implementations • 7 Jul 2022 • Yangming Zhou, Qichao Ying, Xiangyu Zhang, Zhenxing Qian, Sheng Li, Xinpeng Zhang

We jointly train a 3D-UNet-based watermark embedding network and a decoder that predicts the tampering mask.

Decoder Video Compression

Paper
Add Code

LargeKernel3D: Scaling up Kernels in 3D Sparse CNNs

2 code implementations • CVPR 2023 • Yukang Chen, Jianhui Liu, Xiangyu Zhang, Xiaojuan Qi, Jiaya Jia

Recent advance in 2D CNNs has revealed that large kernels are important.

3D Object Detection Object +3

362

Paper
Code

DECK: Model Hardening for Defending Pervasive Backdoors

no code implementations • 18 Jun 2022 • Guanhong Tao, Yingqi Liu, Siyuan Cheng, Shengwei An, Zhuo Zhang, QiuLing Xu, Guangyu Shen, Xiangyu Zhang

As such, using the samples derived from our attack in adversarial training can harden a model against these backdoor vulnerabilities.

Decoder

Paper
Add Code

PETRv2: A Unified Framework for 3D Perception from Multi-Camera Images

1 code implementation • ICCV 2023 • Yingfei Liu, Junjie Yan, Fan Jia, Shuailin Li, Aqi Gao, Tiancai Wang, Xiangyu Zhang, Jian Sun

More specifically, we extend the 3D position embedding (3D PE) in PETR for temporal modeling.

Ranked #2 on Bird's-Eye View Semantic Segmentation on nuScenes (IoU lane - 224x480 - 100x100 at 0.5 metric)

3D Lane Detection 3D Object Detection +6

803

Paper
Code

Self-Supervised Visual Representation Learning with Semantic Grouping

1 code implementation • 30 May 2022 • Xin Wen, Bingchen Zhao, Anlin Zheng, Xiangyu Zhang, Xiaojuan Qi

The semantic grouping is performed by assigning pixels to a set of learnable prototypes, which can adapt to each sample by attentive pooling over the feature and form new slots.

Ranked #15 on Unsupervised Semantic Segmentation on COCO-Stuff-27 (Accuracy metric)

Contrastive Learning Instance Segmentation +6

Paper
Code

Re-parameterizing Your Optimizers rather than Architectures

1 code implementation • 30 May 2022 • Xiaohan Ding, Honghao Chen, Xiangyu Zhang, Kaiqi Huang, Jungong Han, Guiguang Ding

For the extreme simplicity of model structure, we focus on a VGG-style plain model and showcase that such a simple model trained with a RepOptimizer, which is referred to as RepOpt-VGG, performs on par with or better than the recent well-designed models.

Quantization

246

Paper
Code

GL-RG: Global-Local Representation Granularity for Video Captioning

1 code implementation • 22 May 2022 • Liqi Yan, Qifan Wang, Yiming Cui, Fuli Feng, Xiaojun Quan, Xiangyu Zhang, Dongfang Liu

Video captioning is a challenging task as it needs to accurately transform visual understanding into natural language description.

Caption Generation Descriptive +1

Paper
Code

Focal Sparse Convolutional Networks for 3D Object Detection

2 code implementations • CVPR 2022 • Yukang Chen, Yanwei Li, Xiangyu Zhang, Jian Sun, Jiaya Jia

In this paper, we introduce two new modules to enhance the capability of Sparse CNNs, both are based on making feature sparsity learnable with position-wise importance prediction.

3D Object Detection Object +1

362

Paper
Code

When NAS Meets Trees: An Efficient Algorithm for Neural Architecture Search

1 code implementation • 11 Apr 2022 • Guocheng Qian, Xuanyang Zhang, Guohao Li, Chen Zhao, Yukang Chen, Xiangyu Zhang, Bernard Ghanem, Jian Sun

TNAS performs a modified bi-level Breadth-First Search in the proposed trees to discover a high-performance architecture.

Ranked #8 on Neural Architecture Search on NAS-Bench-201, CIFAR-10

Neural Architecture Search

Paper
Code

Simple Baselines for Image Restoration

9 code implementations • 10 Apr 2022 • Liangyu Chen, Xiaojie Chu, Xiangyu Zhang, Jian Sun

Although there have been significant advances in the field of image restoration recently, the system complexity of the state-of-the-art (SOTA) methods is increasing as well, which may hinder the convenient analysis and comparison of methods.

Ranked #1 on Deblurring on MSU BASED

Deblurring Image Deblurring +2

2,040

Paper
Code

Near-optimality for infinite-horizon restless bandits with many arms

no code implementations • 29 Mar 2022 • Xiangyu Zhang, Peter I. Frazier

Although an average-case-optimal policy can be computed via stochastic dynamic programming, the computation required grows exponentially with the number of arms $N$.

Active Learning Management +1

Paper
Add Code

Tree Energy Loss: Towards Sparsely Annotated Semantic Segmentation

1 code implementation • CVPR 2022 • Zhiyuan Liang, Tiancai Wang, Xiangyu Zhang, Jian Sun, Jianbing Shen

The tree energy loss is effective and easy to be incorporated into existing frameworks by combining it with a traditional segmentation loss.

Segmentation Semantic Segmentation

102

Paper
Code

Progressive End-to-End Object Detection in Crowded Scenes

2 code implementations • CVPR 2022 • Anlin Zheng, Yuang Zhang, Xiangyu Zhang, Xiaojuan Qi, Jian Sun

Experiments show that our method can significantly boost the performance of query-based detectors in crowded scenes.

Ranked #1 on Object Detection on CrowdHuman

Object object-detection +1

Paper
Code

Scaling Up Your Kernels to 31x31: Revisiting Large Kernel Design in CNNs

7 code implementations • CVPR 2022 • Xiaohan Ding, Xiangyu Zhang, Yizhuang Zhou, Jungong Han, Guiguang Ding, Jian Sun

We revisit large kernel design in modern convolutional neural networks (CNNs).

Ranked #75 on Image Classification on ImageNet

Image Classification

3,037

Paper
Code

PETR: Position Embedding Transformation for Multi-View 3D Object Detection

1 code implementation • 10 Mar 2022 • Yingfei Liu, Tiancai Wang, Xiangyu Zhang, Jian Sun

Object query can perceive the 3D position-aware features and perform end-to-end object detection.

Ranked #3 on 3D Object Detection on 3D Object Detection on Argoverse2 Camera Only

3D Object Detection Object +3

803

Paper
Code

Curriculum-based Reinforcement Learning for Distribution System Critical Load Restoration

1 code implementation • 8 Mar 2022 • Xiangyu Zhang, Abinet Tesfaye Eseye, Bernard Knueven, Weijia Liu, Matthew Reynolds, Wesley Jones

This paper focuses on the critical load restoration problem in distribution systems following major outages.

Decision Making reinforcement-learning +1

Paper
Code

Constrained Optimization with Dynamic Bound-scaling for Effective NLPBackdoor Defense

no code implementations • 11 Feb 2022 • Guangyu Shen, Yingqi Liu, Guanhong Tao, QiuLing Xu, Zhuo Zhang, Shengwei An, Shiqing Ma, Xiangyu Zhang

We develop a novel optimization method for NLPbackdoor inversion.

Paper
Add Code

Relieving Long-tailed Instance Segmentation via Pairwise Class Balance

2 code implementations • CVPR 2022 • Yin-Yin He, Peizhen Zhang, Xiu-Shen Wei, Xiangyu Zhang, Jian Sun

In this paper, we explore to excavate the confusion matrix, which carries the fine-grained misclassification details, to relieve the pairwise biases, generalizing the coarse one.

Instance Segmentation Semantic Segmentation

Paper
Code

Communication-Efficient TeraByte-Scale Model Training Framework for Online Advertising

no code implementations • 5 Jan 2022 • Weijie Zhao, Xuewu Jiao, Mingqing Hu, Xiaoyun Li, Xiangyu Zhang, Ping Li

In this paper, we propose a hardware-aware training workflow that couples the hardware topology into the algorithm design.

Click-Through Rate Prediction

Paper
Add Code

Bounded Adversarial Attack on Deep Content Features

1 code implementation • CVPR 2022 • QiuLing Xu, Guanhong Tao, Xiangyu Zhang

We propose a novel adversarial attack targeting content features in some deep layer, that is, individual neurons in the layer.

Adversarial Attack

Paper
Code

Better Trigger Inversion Optimization in Backdoor Scanning

no code implementations • CVPR 2022 • Guanhong Tao, Guangyu Shen, Yingqi Liu, Shengwei An, QiuLing Xu, Shiqing Ma, Pan Li, Xiangyu Zhang

A popular trigger inversion method is by optimization.

Paper
Add Code

Complex Backdoor Detection by Symmetric Feature Differencing

1 code implementation • CVPR 2022 • Yingqi Liu, Guangyu Shen, Guanhong Tao, Zhenting Wang, Shiqing Ma, Xiangyu Zhang

Our results on the TrojAI competition rounds 2-4, which have patch backdoors and filter backdoors, show that existing scanners may produce hundreds of false positives (i. e., clean models recognized as trojaned), while our technique removes 78-100% of them with a small increase of false negatives by 0-30%, leading to 17-41% overall accuracy improvement.

Paper
Code

RepMLPNet: Hierarchical Vision MLP with Re-parameterized Locality

4 code implementations • CVPR 2022 • Xiaohan Ding, Honghao Chen, Xiangyu Zhang, Jungong Han, Guiguang Ding

Our results reveal that 1) Locality Injection is a general methodology for MLP models; 2) RepMLPNet has favorable accuracy-efficiency trade-off compared to the other MLPs; 3) RepMLPNet is the first MLP that seamlessly transfer to Cityscapes semantic segmentation.

Ranked #61 on Semantic Segmentation on Cityscapes val

Image Classification Semantic Segmentation

3,037

Paper
Code

On Efficient Transformer-Based Image Pre-training for Low-Level Vision

1 code implementation • 19 Dec 2021 • Wenbo Li, Xin Lu, Shengju Qian, Jiangbo Lu, Xiangyu Zhang, Jiaya Jia

Pre-training has marked numerous state of the arts in high-level computer vision, while few attempts have ever been made to investigate how pre-training acts in image processing systems.

Ranked #5 on Image Super-Resolution on Set5 - 2x upscaling (using extra training data)

Denoising Image Super-Resolution

125

Paper
Code

Implicit Feature Refinement for Instance Segmentation

1 code implementation • 9 Dec 2021 • Lufan Ma, Tiancai Wang, Bin Dong, Jiangpeng Yan, Xiu Li, Xiangyu Zhang

Our IFR enjoys several advantages: 1) simulates an infinite-depth refinement network while only requiring parameters of single residual block; 2) produces high-level equilibrium instance features of global receptive field; 3) serves as a plug-and-play general module easily extended to most object recognition frameworks.

Instance Segmentation Object Recognition +3

Paper
Code

Two-step Lookahead Bayesian Optimization with Inequality Constraints

no code implementations • 6 Dec 2021 • Yunxiang Zhang, Xiangyu Zhang, Peter I. Frazier

Recent advances in computationally efficient non-myopic Bayesian optimization (BO) improve query efficiency over traditional myopic methods like expected improvement while only modestly increasing computational cost.

Bayesian Optimization Vocal Bursts Valence Prediction

Paper
Add Code

Spherical Motion Dynamics: Learning Dynamics of Normalized Neural Network using SGD and Weight Decay

no code implementations • NeurIPS 2021 • Ruosi Wan, Zhanxing Zhu, Xiangyu Zhang, Jian Sun

Specifically, 1) we introduce the assumptions that can lead to equilibrium state in SMD, and prove equilibrium can be reached in a linear rate regime under given assumptions; 2) we propose ``angular update" as a substitute for effective learning rate to depict the state of SMD, and derive the theoretical value of angular update in equilibrium state; 3) we verify our assumptions and theoretical results on various large-scale computer vision tasks including ImageNet and MSCOCO with standard settings.

Paper
Add Code

Constrained Two-step Look-Ahead Bayesian Optimization

no code implementations • NeurIPS 2021 • Yunxiang Zhang, Xiangyu Zhang, Peter Frazier

Recent advances in computationally efficient non-myopic Bayesian optimization offer improved query efficiency over traditional myopic methods like expected improvement, with only a modest increase in computational cost.

Bayesian Optimization Vocal Bursts Valence Prediction

Paper
Add Code

PowerGridworld: A Framework for Multi-Agent Reinforcement Learning in Power Systems

1 code implementation • 10 Nov 2021 • David Biagioni, Xiangyu Zhang, Dylan Wald, Deepthi Vaidhynathan, Rohit Chintala, Jennifer King, Ahmed S. Zamzam

We present the PowerGridworld software package to provide users with a lightweight, modular, and customizable framework for creating power-systems-focused, multi-agent Gym environments that readily integrate with existing training frameworks for reinforcement learning (RL).

Multi-agent Reinforcement Learning reinforcement-learning +1

Paper
Code

A Comparison of Model-Free and Model Predictive Control for Price Responsive Water Heaters

no code implementations • 8 Nov 2021 • David J. Biagioni, Xiangyu Zhang, Peter Graf, Devon Sigler, Wesley Jones

We demonstrate that optimal control for this problem is challenging, requiring more than 8-hour lookahead for MPC with perfect forecasting to attain the minimum cost.

Model Predictive Control Time Series +1

Paper
Add Code

Raw Bayer Pattern Image Synthesis for Computer Vision-oriented Image Signal Processing Pipeline Design

no code implementations • 25 Oct 2021 • Wei Zhou, Xiangyu Zhang, Hongyu Wang, Shenghua Gao, Xin Lou

It is shown that by adding another transformation, the proposed method is able to synthesize high-quality RAW Bayer images with arbitrary size.

Demosaicking Image Generation +3

Paper
Add Code

Instance-Conditional Knowledge Distillation for Object Detection

1 code implementation • NeurIPS 2021 • Zijian Kang, Peizhen Zhang, Xiangyu Zhang, Jian Sun, Nanning Zheng

Knowledge distillation has shown great success in classification, however, it is still challenging for detection.

Image Classification Knowledge Distillation +3

Paper
Code

RWN: Robust Watermarking Network for Image Cropping Localization

no code implementations • 12 Oct 2021 • Qichao Ying, Xiaoxiao Hu, Xiangyu Zhang, Zhenxing Qian, Xinpeng Zhang

At the recipient's side, ACP extracts the watermark from the attacked image, and we conduct feature matching on the original and extracted watermark to locate the position of the crop in the original image plane.

Image Cropping Image Forensics

Paper
Add Code

Partial to Whole Knowledge Distillation: Progressive Distilling Decomposed Knowledge Boosts Student Better

no code implementations • 26 Sep 2021 • Xuanyang Zhang, Xiangyu Zhang, Jian Sun

Knowledge distillation field delicately designs various types of knowledge to shrink the performance gap between compact student and large-scale teacher.

Knowledge Distillation

Paper
Add Code

LGD: Label-guided Self-distillation for Object Detection

1 code implementation • 23 Sep 2021 • Peizhen Zhang, Zijian Kang, Tong Yang, Xiangyu Zhang, Nanning Zheng, Jian Sun

Instead, we generate an instructive knowledge based only on student representations and regular labels.

Instance Segmentation Object +4

Paper
Code

Anchor DETR: Query Design for Transformer-Based Object Detection

2 code implementations • 15 Sep 2021 • Yingming Wang, Xiangyu Zhang, Tong Yang, Jian Sun

Thanks to the query design and the attention variant, the proposed detector that we called Anchor DETR, can achieve better performance and run faster than the DETR with 10$\times$ fewer training epochs.

Object object-detection +1

327

Paper
Code

Image Synthesis via Semantic Composition

no code implementations • ICCV 2021 • Yi Wang, Lu Qi, Ying-Cong Chen, Xiangyu Zhang, Jiaya Jia

In this paper, we present a novel approach to synthesize realistic images based on their semantic layouts.

Image Generation Semantic Composition

Paper
Add Code

Accelerating Markov Random Field Inference with Uncertainty Quantification

no code implementations • 2 Aug 2021 • Ramin Bashizade, Xiangyu Zhang, Sayan Mukherjee, Alvin R. Lebeck

In this paper, we propose a high-throughput accelerator for Markov Random Field (MRF) inference, a powerful model for representing a wide range of applications, using MCMC with Gibbs sampling.

Motion Estimation Playing the Game of 2048 +1

Paper
Add Code

Restless Bandits with Many Arms: Beating the Central Limit Theorem

no code implementations • 25 Jul 2021 • Xiangyu Zhang, Peter I. Frazier

Thus, there is substantial value in understanding the performance of index policies and other policies that can be computed efficiently for large $N$.

Active Learning Management +1

Paper
Add Code

The Threat of Offensive AI to Organizations

no code implementations • 30 Jun 2021 • Yisroel Mirsky, Ambra Demontis, Jaidip Kotak, Ram Shankar, Deng Gelei, Liu Yang, Xiangyu Zhang, Wenke Lee, Yuval Elovici, Battista Biggio

Although offensive AI has been discussed in the past, there is a need to analyze and understand the threat in the context of organizations.

Paper
Add Code

SOLQ: Segmenting Objects by Learning Queries

1 code implementation • NeurIPS 2021 • Bin Dong, Fangao Zeng, Tiancai Wang, Xiangyu Zhang, Yichen Wei

Moreover, the joint learning of unified query representation can greatly improve the detection performance of DETR.

Ranked #4 on Object Detection on COCO minival (AP75 metric)

Instance Segmentation Object Detection +2

196

Paper
Code

Fast Camera Image Denoising on Mobile GPUs with Deep Learning, Mobile AI 2021 Challenge: Report

no code implementations • 17 May 2021 • Andrey Ignatov, Kim Byeoung-su, Radu Timofte, Angeline Pouget, Fenglong Song, Cheng Li, Shuai Xiao, Zhongqian Fu, Matteo Maggioni, Yibin Huang, Shen Cheng, Xin Lu, Yifeng Zhou, Liangyu Chen, Donghao Liu, Xiangyu Zhang, Haoqiang Fan, Jian Sun, Shuaicheng Liu, Minsu Kwon, Myungje Lee, Jaeyoon Yoo, Changbeom Kang, Shinjo Wang, Bin Huang, Tianbao Zhou, Shuai Liu, Lei Lei, Chaoyu Feng, Liguang Huang, Zhikun Lei, Feifei Chen

A detailed description of all models developed in the challenge is provided in this paper.

Image Denoising

Paper
Add Code

MOTR: End-to-End Multiple-Object Tracking with Transformer

2 code implementations • 7 May 2021 • Fangao Zeng, Bin Dong, Yuang Zhang, Tiancai Wang, Xiangyu Zhang, Yichen Wei

Temporal modeling of objects is a key challenge in multiple object tracking (MOT).

Ranked #1 on Multi-Object Tracking on MOT17 (e2e-MOT metric)

Multi-Object Tracking Multiple Object Tracking with Transformer +1

561

Paper
Code

RepMLP: Re-parameterizing Convolutions into Fully-connected Layers for Image Recognition

10 code implementations • 5 May 2021 • Xiaohan Ding, Chunlong Xia, Xiangyu Zhang, Xiaojie Chu, Jungong Han, Guiguang Ding

We propose RepMLP, a multi-layer-perceptron-style neural network building block for image recognition, which is composed of a series of fully-connected (FC) layers.

Ranked #754 on Image Classification on ImageNet

Face Recognition Image Classification +1

11,022

Paper
Code

Points as Queries: Weakly Semi-supervised Object Detection by Points

1 code implementation • CVPR 2021 • Liangyu Chen, Tong Yang, Xiangyu Zhang, Wei zhang, Jian Sun

We propose a novel point annotated setting for the weakly semi-supervised object detection task, in which the dataset comprises small fully annotated images and large weakly annotated images by points.

object-detection Object Detection +1

Paper
Code

Joint User Association and Power Allocation in Heterogeneous Ultra Dense Network via Semi-Supervised Representation Learning

no code implementations • 29 Mar 2021 • Xiangyu Zhang, Zhengming Zhang, Luxi Yang

We model the HUDNs as a heterogeneous graph and train a Graph Neural Network (GNN) to approach this representation function by using semi-supervised learning, in which the loss function is composed of the unsupervised part that helps the GNN approach the optimal representation function and the supervised part that utilizes the previous experience to reduce useless exploration.

Computational Efficiency Graph Neural Network +1

Paper
Add Code

Diverse Branch Block: Building a Convolution as an Inception-like Unit

2 code implementations • CVPR 2021 • Xiaohan Ding, Xiangyu Zhang, Jungong Han, Guiguang Ding

We propose a universal building block of Convolutional Neural Network (ConvNet) to improve the performance without any inference-time costs.

Image Classification object-detection +2

310

Paper
Code

You Only Look One-level Feature

6 code implementations • CVPR 2021 • Qiang Chen, Yingming Wang, Tong Yang, Xiangyu Zhang, Jian Cheng, Jian Sun

From the perspective of optimization, we introduce an alternative way to address the problem instead of adopting the complex feature pyramids - {\em utilizing only one-level feature for detection}.

Ranked #142 on Object Detection on COCO test-dev

object-detection Object Detection

28,200

Paper
Code

EX-RAY: Distinguishing Injected Backdoor from Natural Features in Neural Networks by Examining Differential Feature Symmetry

no code implementations • 16 Mar 2021 • Yingqi Liu, Guangyu Shen, Guanhong Tao, Zhenting Wang, Shiqing Ma, Xiangyu Zhang

A prominent challenge is hence to distinguish natural features and injected backdoors.

Backdoor Attack

Paper
Add Code

Backdoor Scanning for Deep Neural Networks through K-Arm Optimization

1 code implementation • 9 Feb 2021 • Guangyu Shen, Yingqi Liu, Guanhong Tao, Shengwei An, QiuLing Xu, Siyuan Cheng, Shiqing Ma, Xiangyu Zhang

By iteratively and stochastically selecting the most promising labels for optimization with the guidance of an objective function, we substantially reduce the complexity, allowing to handle models with many classes.

Paper
Code

Neural Architecture Search with Random Labels

1 code implementation • CVPR 2021 • Xuanyang Zhang, Pengfei Hou, Xiangyu Zhang, Jian Sun

In this paper, we investigate a new variant of neural architecture search (NAS) paradigm -- searching with random labels (RLNAS).

Neural Architecture Search

Paper
Code

RepVGG: Making VGG-style ConvNets Great Again

23 code implementations • CVPR 2021 • Xiaohan Ding, Xiangyu Zhang, Ningning Ma, Jungong Han, Guiguang Ding, Jian Sun

We present a simple but powerful architecture of convolutional neural network, which has a VGG-like inference-time body composed of nothing but a stack of 3x3 convolution and ReLU, while the training-time model has a multi-branch topology.

Ranked #44 on Semantic Segmentation on Cityscapes val

Image Classification Semantic Segmentation

30,258

Paper
Code

Implicit Feature Pyramid Network for Object Detection

no code implementations • 25 Dec 2020 • Tiancai Wang, Xiangyu Zhang, Jian Sun

In this paper, we present an implicit feature pyramid network (i-FPN) for object detection.

Object object-detection +1

Paper
Add Code

Deep Feature Space Trojan Attack of Neural Networks by Controlled Detoxification

2 code implementations • 21 Dec 2020 • Siyuan Cheng, Yingqi Liu, Shiqing Ma, Xiangyu Zhang

Trojan (backdoor) attack is a form of adversarial attack on deep neural networks where the attacker provides victims with a model trained/retrained on malicious data.

Backdoor Attack

Paper
Code

Rethinking Learnable Tree Filter for Generic Feature Transform

1 code implementation • NeurIPS 2020 • Lin Song, Yanwei Li, Zhengkai Jiang, Zeming Li, Xiangyu Zhang, Hongbin Sun, Jian Sun, Nanning Zheng

The Learnable Tree Filter presents a remarkable approach to model structure-preserving relations for semantic segmentation.

Instance Segmentation object-detection +3

Paper
Code

Co-mining: Self-Supervised Learning for Sparsely Annotated Object Detection

1 code implementation • 3 Dec 2020 • Tiancai Wang, Tong Yang, Jiale Cao, Xiangyu Zhang

Object detectors usually achieve promising results with the supervision of complete instance annotations.

MULTI-VIEW LEARNING Object +4

Paper
Code

Microlensing Predictions: Impact of Galactic Disc Dynamical Models

no code implementations • 30 Oct 2020 • Hongjing Yang, Shude Mao, Weicheng Zang, Xiangyu Zhang

Additionally, we find the asymptotic power-law behaviors in both $\theta_{\rm E}$ and $\pi_{\rm E}$ distributions, and we provide a simple model to understand them.

Astrophysics of Galaxies Earth and Planetary Astrophysics Solar and Stellar Astrophysics

Paper
Add Code

Joint COCO and Mapillary Workshop at ICCV 2019: COCO Instance Segmentation Challenge Track

no code implementations • 6 Oct 2020 • Zeming Li, Yuchen Ma, Yukang Chen, Xiangyu Zhang, Jian Sun

In this report, we present our object detection/instance segmentation system, MegDetV2, which works in a two-pass fashion, first to detect instances then to obtain segmentation.

Instance Segmentation object-detection +3

Paper
Add Code

EqCo: Equivalent Rules for Self-supervised Contrastive Learning

1 code implementation • 5 Oct 2020 • Benjin Zhu, Junqiang Huang, Zeming Li, Xiangyu Zhang, Jian Sun

In this paper, we propose EqCo (Equivalent Rules for Contrastive Learning) to make self-supervised learning irrelevant to the number of negative samples in the contrastive learning framework.

Contrastive Learning Linear evaluation +1

Paper
Code

MPG-Net: Multi-Prediction Guided Network for Segmentation of Retinal Layers in OCT Images

no code implementations • 28 Sep 2020 • Zeyu Fu, Yang Sun, Xiangyu Zhang, Scott Stainton, Shaun Barney, Jeffry Hogg, William Innes, Satnam Dlay

In this paper, we propose a novel multiprediction guided attention network (MPG-Net) for automated retinal layer segmentation in OCT images.

Segmentation

Paper
Add Code

Deep Learning & Software Engineering: State of Research and Future Directions

1 code implementation • 17 Sep 2020 • Prem Devanbu, Matthew Dwyer, Sebastian Elbaum, Michael Lowry, Kevin Moran, Denys Poshyvanyk, Baishakhi Ray, Rishabh Singh, Xiangyu Zhang

The intent of this report is to serve as a potential roadmap to guide future work that sits at the intersection of SE & DL.

Paper
Code

Activate or Not: Learning Customized Activation

4 code implementations • CVPR 2021 • Ningning Ma, Xiangyu Zhang, Ming Liu, Jian Sun

We present a simple, effective, and general activation function we term ACON which learns to activate the neurons or not.

object-detection Object Detection +1

203

Paper
Code

Black-box Adversarial Sample Generation Based on Differential Evolution

no code implementations • 30 Jul 2020 • Junyu Lin, Lei Xu, Yingqi Liu, Xiangyu Zhang

The technique does not require any knowledge of the structure or weights of the target DNN.

Machine Translation object-detection +1

Paper
Add Code

WeightNet: Revisiting the Design Space of Weight Networks

2 code implementations • ECCV 2020 • Ningning Ma, Xiangyu Zhang, Jiawei Huang, Jian Sun

WeightNet is easy and memory-conserving to train, on the kernel space instead of the feature space.

172

Paper
Code

Funnel Activation for Visual Recognition

6 code implementations • ECCV 2020 • Ningning Ma, Xiangyu Zhang, Jian Sun

We present a conceptually simple but effective funnel activation for image recognition tasks, called Funnel activation (FReLU), that extends ReLU and PReLU to a 2D activation by adding a negligible overhead of spatial condition.

Scene Generation Semantic Segmentation

175

Paper
Code

LabelEnc: A New Intermediate Supervision Method for Object Detection

1 code implementation • ECCV 2020 • Miao Hao, Yitao Liu, Xiangyu Zhang, Jian Sun

In this paper we propose a new intermediate supervision method, named LabelEnc, to boost the training of object detection systems.

Object object-detection +1

Paper
Code

Weight-dependent Gates for Network Pruning

no code implementations • 4 Jul 2020 • Yun Li, Zechun Liu, Weiqun Wu, Haotian Yao, Xiangyu Zhang, Chi Zhang, Baoqun Yin

In this paper, a simple yet effective network pruning framework is proposed to simultaneously address the problems of pruning indicator, pruning ratio, and efficiency constraint.

Network Pruning

Paper
Add Code

Spherical Motion Dynamics: Learning Dynamics of Neural Network with Normalization, Weight Decay, and SGD

no code implementations • 15 Jun 2020 • Ruosi Wan, Zhanxing Zhu, Xiangyu Zhang, Jian Sun

In this work, we comprehensively reveal the learning dynamics of neural network with normalization, weight decay (WD), and SGD (with momentum), named as Spherical Motion Dynamics (SMD).

Paper
Add Code

D-square-B: Deep Distribution Bound for Natural-looking Adversarial Attack

no code implementations • 12 Jun 2020 • Qiu-Ling Xu, Guanhong Tao, Xiangyu Zhang

We propose a novel technique that can generate natural-looking adversarial examples by bounding the variations induced for internal activation values in some deep layer(s), through a distribution quantile bound and a polynomial barrier loss function.

Adversarial Attack

Paper
Add Code

Exhaustive goodness-of-fit via smoothed inference and graphics

1 code implementation • 26 May 2020 • Sara Algeri, Xiangyu Zhang

Classical tests of goodness-of-fit aim to validate the conformity of a postulated model to the data under study.

Methodology Statistics Theory Applications Statistics Theory

Paper
Code

Joint Multi-Dimension Pruning via Numerical Gradient Update

no code implementations • 18 May 2020 • Zechun Liu, Xiangyu Zhang, Zhiqiang Shen, Zhe Li, Yichen Wei, Kwang-Ting Cheng, Jian Sun

To tackle these three naturally different dimensions, we proposed a general framework by defining pruning as seeking the best pruning vector (i. e., the numerical value of layer-wise channel number, spacial size, depth) and construct a unique mapping from the pruning vector to the pruned network structures.

Paper
Add Code

Angle-based Search Space Shrinking for Neural Architecture Search

1 code implementation • ECCV 2020 • Yiming Hu, Yuding Liang, Zichao Guo, Ruosi Wan, Xiangyu Zhang, Yichen Wei, Qingyi Gu, Jian Sun

Comprehensive experiments show that ABS can dramatically enhance existing NAS approaches by providing a promising shrunk search space.

Neural Architecture Search

Paper
Code

Towards Feature Space Adversarial Attack

1 code implementation • 26 Apr 2020 • Qiu-Ling Xu, Guanhong Tao, Siyuan Cheng, Xiangyu Zhang

We propose a new adversarial attack to Deep Neural Networks for image classification.

Adversarial Attack Detection Image Classification

Paper
Code

Dynamic Scale Training for Object Detection

4 code implementations • 26 Apr 2020 • Yukang Chen, Peizhen Zhang, Zeming Li, Yanwei Li, Xiangyu Zhang, Lu Qi, Jian Sun, Jiaya Jia

We propose a Dynamic Scale Training paradigm (abbreviated as DST) to mitigate scale variation challenge in object detection.

Instance Segmentation Model Optimization +4

Paper
Code

Personalized Re-ranking for Improving Diversity in Live Recommender Systems

no code implementations • 14 Apr 2020 • Yichao Wang, Xiangyu Zhang, Zhirong Liu, Zhenhua Dong, Xinhua Feng, Ruiming Tang, Xiuqiang He

To overcome such limitation, our re-ranking model proposes a personalized DPP to model the trade-off between accuracy and diversity for each individual user.

Recommendation Systems Re-Ranking

Paper
Add Code

Attentive Normalization for Conditional Image Generation

1 code implementation • CVPR 2020 • Yi Wang, Ying-Cong Chen, Xiangyu Zhang, Jian Sun, Jiaya Jia

Traditional convolution-based generative adversarial networks synthesize images based on hierarchical local operations, where long-range dependency relation is implicitly modeled with a Markov chain.

Conditional Image Generation Semantic correspondence +2

Paper
Code

Learning Human-Object Interaction Detection using Interaction Points

1 code implementation • CVPR 2020 • Tiancai Wang, Tong Yang, Martin Danelljan, Fahad Shahbaz Khan, Xiangyu Zhang, Jian Sun

Human-object interaction (HOI) detection strives to localize both the human and an object as well as the identification of complex interactions between them.

Human-Object Interaction Detection Keypoint Detection +2

Paper
Code

Dynamic Region-Aware Convolution

no code implementations • CVPR 2021 • Jin Chen, Xijun Wang, Zichao Guo, Xiangyu Zhang, Jian Sun

More gracefully, our DRConv transfers the increasing channel-wise filters to spatial dimension with learnable instructor, which not only improve representation ability of convolution, but also maintains computational cost and the translation-invariance as standard convolution dose.

Ranked #14 on Semantic Segmentation on MCubeS

Face Recognition General Classification +2

Paper
Add Code

Learning Dynamic Routing for Semantic Segmentation

1 code implementation • CVPR 2020 • Yanwei Li, Lin Song, Yukang Chen, Zeming Li, Xiangyu Zhang, Xingang Wang, Jian Sun

To demonstrate the superiority of the dynamic property, we compare with several static architectures, which can be modeled as special cases in the routing space.

Segmentation Semantic Segmentation

378

Paper
Code

Detection in Crowded Scenes: One Proposal, Multiple Predictions

3 code implementations • CVPR 2020 • Xuangeng Chu, Anlin Zheng, Xiangyu Zhang, Jian Sun

We propose a simple yet effective proposal-based object detector, aiming at detecting highly-overlapped instances in crowded scenes.

Ranked #2 on Pedestrian Detection on TJU-Ped-campus

Object Detection Pedestrian Detection

3,078

Paper
Code

PointINS: Point-based Instance Segmentation

no code implementations • 13 Mar 2020 • Lu Qi, Yi Wang, Yukang Chen, Yingcong Chen, Xiangyu Zhang, Jian Sun, Jiaya Jia

In this paper, we explore the mask representation in instance segmentation with Point-of-Interest (PoI) features.

Instance Segmentation Object Detection +3

Paper
Add Code

Learning Delicate Local Representations for Multi-Person Pose Estimation

4 code implementations • ECCV 2020 • Yuanhao Cai, Zhicheng Wang, Zhengxiong Luo, Binyi Yin, Angang Du, Haoqian Wang, Xiangyu Zhang, Xinyu Zhou, Erjin Zhou, Jian Sun

To tackle this problem, we propose an efficient attention mechanism - Pose Refine Machine (PRM) to make a trade-off between local and global representations in output features and further refine the keypoint locations.

Ranked #1 on Keypoint Detection on COCO test-challenge

Keypoint Detection Multi-Person Pose Estimation

5,152

Paper
Code

Beyond Application End-Point Results: Quantifying Statistical Robustness of MCMC Accelerators

no code implementations • 5 Mar 2020 • Xiangyu Zhang, Ramin Bashizade, Yicheng Wang, Cheng Lyu, Sayan Mukherjee, Alvin R. Lebeck

Applying the framework to guide design space exploration shows that statistical robustness comparable to floating-point software can be achieved by slightly increasing the bit representation, without floating-point hardware requirements.

Paper
Add Code

Towards Stabilizing Batch Statistics in Backward Propagation of Batch Normalization

1 code implementation • ICLR 2020 • Junjie Yan, Ruosi Wan, Xiangyu Zhang, Wei zhang, Yichen Wei, Jian Sun

Therefore many modified normalization techniques have been proposed, which either fail to restore the performance of BN completely, or have to introduce additional nonlinear operations in inference procedure and increase huge consumption.

182

Paper
Code

Learning-Accelerated ADMM for Distributed Optimal Power Flow

no code implementations • 8 Nov 2019 • David Biagioni, Peter Graf, Xiangyu Zhang, Ahmed Zamzam, Kyri Baker, Jennifer King

We propose a novel data-driven method to accelerate the convergence of Alternating Direction Method of Multipliers (ADMM) for solving distributed DC optimal power flow (DC-OPF) where lines are shared between independent network partitions.

Distributed Optimization

Paper
Add Code

A Case for Quantifying Statistical Robustness of Specialized Probabilistic AI Accelerators

no code implementations • 27 Oct 2019 • Xiangyu Zhang, Sayan Mukherjee, Alvin R. Lebeck

Although a common approach is to compare the end-point result quality using community-standard benchmarks and metrics, we claim a probabilistic architecture should provide some measure (or guarantee) of statistical robustness.

Paper
Add Code

Resizable Neural Networks

no code implementations • 25 Sep 2019 • Yichen Zhu, Xiangyu Zhang, Tong Yang, Jian Sun

We introduce the adaptive resizable networks as dynamic networks, which further improve the performance with less computational cost via data-dependent inference.

Data Augmentation Neural Architecture Search

Paper
Add Code

VAENAS: Sampling Matters in Neural Architecture Search

no code implementations • 25 Sep 2019 • Shizheng Qin, Yichen Zhu, Pengfei Hou, Xiangyu Zhang, Wenqiang Zhang, Jian Sun

In this paper, we propose a learnable sampling module based on variational auto-encoder (VAE) for neural architecture search (NAS), named as VAENAS, which can be easily embedded into existing weight sharing NAS framework, e. g., one-shot approach and gradient-based approach, and significantly improve the performance of searching results.

Neural Architecture Search

Paper
Add Code

Testing Deep Learning Models for Image Analysis Using Object-Relevant Metamorphic Relations

no code implementations • 6 Sep 2019 • Yongqiang Tian, Shiqing Ma, Ming Wen, Yepang Liu, Shing-Chi Cheung, Xiangyu Zhang

The corresponding rate for the object detection models is over 8. 5%.

General Classification Image Classification +4

Paper
Add Code

Arbitrage of Energy Storage in Electricity Markets with Deep Reinforcement Learning

no code implementations • 28 Apr 2019 • Hanchen Xu, Xiao Li, Xiangyu Zhang, Junbo Zhang

In this letter, we address the problem of controlling energy storage systems (ESSs) for arbitrage in real-time electricity markets under price uncertainty.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Single Path One-Shot Neural Architecture Search with Uniform Sampling

6 code implementations • ECCV 2020 • Zichao Guo, Xiangyu Zhang, Haoyuan Mu, Wen Heng, Zechun Liu, Yichen Wei, Jian Sun

It is easy to train and fast to search.

Ranked #88 on Neural Architecture Search on ImageNet (Accuracy metric)

Neural Architecture Search Quantization

1,390

Paper
Code

DetNAS: Backbone Search for Object Detection

2 code implementations • NeurIPS 2019 • Yukang Chen, Tong Yang, Xiangyu Zhang, Gaofeng Meng, Xinyu Xiao, Jian Sun

In this work, we present DetNAS to use Neural Architecture Search (NAS) for the design of better backbones for object detection.

General Classification Image Classification +4

1,390

Paper
Code

MetaPruning: Meta Learning for Automatic Neural Network Channel Pruning

2 code implementations • ICCV 2019 • Zechun Liu, Haoyuan Mu, Xiangyu Zhang, Zichao Guo, Xin Yang, Tim Kwang-Ting Cheng, Jian Sun

In this paper, we propose a novel meta learning approach for automatic channel pruning of very deep neural networks.

AutoML Meta-Learning

350

Paper
Code

Meta-SR: A Magnification-Arbitrary Network for Super-Resolution

2 code implementations • CVPR 2019 • Xuecai Hu, Haoyuan Mu, Xiangyu Zhang, Zilei Wang, Tieniu Tan, Jian Sun

In this work, we propose a novel method called Meta-SR to firstly solve super-resolution of arbitrary scale factor (including non-integer scale factors) with a single model.

Image Super-Resolution

545

Paper
Code

Daedalus: Breaking Non-Maximum Suppression in Object Detection via Adversarial Examples

1 code implementation • 6 Feb 2019 • Derui Wang, Chaoran Li, Sheng Wen, Qing-Long Han, Surya Nepal, Xiangyu Zhang, Yang Xiang

Experimental results demonstrate that the attack effectively stops NMS from filtering redundant bounding boxes.

Autonomous Vehicles object-detection +1

Paper
Code

Attacks Meet Interpretability: Attribute-steered Detection of Adversarial Samples

1 code implementation • NeurIPS 2018 • Guanhong Tao, Shiqing Ma, Yingqi Liu, Xiangyu Zhang

Results show that our technique can achieve 94% detection accuracy for 7 different kinds of attacks with 9. 91% false positives on benign inputs.

Attribute Face Recognition +1

Paper
Code

Bounding Box Regression with Uncertainty for Accurate Object Detection

4 code implementations • CVPR 2019 • Yihui He, Chenchen Zhu, Jianren Wang, Marios Savvides, Xiangyu Zhang

Large-scale object detection datasets (e. g., MS-COCO) try to define the ground truth bounding boxes as clear as possible.

Ranked #22 on Object Detection on PASCAL VOC 2007

Object object-detection +3

708

Paper
Code

DetNet: Design Backbone for Object Detection

no code implementations • ECCV 2018 • Zeming Li, Chao Peng, Gang Yu, Xiangyu Zhang, Yangdong Deng, Jian Sun

(1) Recent object detectors like FPN and RetinaNet usually involve extra stages against the task of image classification to handle the objects with various scales.

Classification General Classification +7

Paper
Add Code

An Open Access Database for Evaluating the Algorithms of Electrocardiogram Rhythm and Morphology Abnormality Detection

no code implementations • JMIHI 2018 • Feifei Liu, Chengyu Liu, Lina Zhao, Xiangyu Zhang, Xiaoling Wu, Xiaoyan Xu, Yulin Liu, Caiyun Ma, Shoushui Wei, Zhiqiang He, Jianqing Li, Eddie Ng Yin Kwee

Over the past few decades, methods for classification and detection of rhythm or morphology abnormalities in ECG signals have been widely studied.

Anomaly Detection

Paper
Add Code

ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design

35 code implementations • ECCV 2018 • Ningning Ma, Xiangyu Zhang, Hai-Tao Zheng, Jian Sun

Datasets, Transforms and Models specific to Computer Vision

Ranked #877 on Image Classification on ImageNet

Image Classification Object Detection

15,597

Paper
Code

MetaAnchor: Learning to Detect Objects with Customized Anchors

no code implementations • NeurIPS 2018 • Tong Yang, Xiangyu Zhang, Zeming Li, Wenqiang Zhang, Jian Sun

We propose a novel and flexible anchor mechanism named MetaAnchor for object detection frameworks.

Object object-detection +1

Paper
Add Code

CrowdHuman: A Benchmark for Detecting Human in a Crowd

1 code implementation • 30 Apr 2018 • Shuai Shao, Zijian Zhao, Boxun Li, Tete Xiao, Gang Yu, Xiangyu Zhang, Jian Sun

There are a total of $470K$ human instances from the train and validation subsets, and $~22. 6$ persons per image, with various kinds of occlusions in the dataset.

Ranked #7 on Pedestrian Detection on Caltech (using extra training data)

Human Detection Object Detection +1

Paper
Code

DetNet: A Backbone network for Object Detection

2 code implementations • 17 Apr 2018 • Zeming Li, Chao Peng, Gang Yu, Xiangyu Zhang, Yangdong Deng, Jian Sun

Due to the gap between the image classification and object detection, we propose DetNet in this paper, which is a novel backbone network specifically designed for object detection.

Classification General Classification +7

Paper
Code

ExFuse: Enhancing Feature Fusion for Semantic Segmentation

no code implementations • ECCV 2018 • Zhenli Zhang, Xiangyu Zhang, Chao Peng, Dazhi Cheng, Jian Sun

Modern semantic segmentation frameworks usually combine low-level and high-level features from pre-trained backbone convolutional models to boost performance.

Ranked #4 on Semantic Segmentation on PASCAL VOC 2012 val (using extra training data)

Segmentation Semantic Segmentation

Paper
Add Code

MegDet: A Large Mini-Batch Object Detector

6 code implementations • CVPR 2018 • Chao Peng, Tete Xiao, Zeming Li, Yuning Jiang, Xiangyu Zhang, Kai Jia, Gang Yu, Jian Sun

The improvements in recent CNN-based object detection works, from R-CNN [11], Fast/Faster R-CNN [10, 31] to recent Mask R-CNN [14] and RetinaNet [24], mainly come from new network, new framework, or novel loss design.

Object object-detection +1

4,867

Paper
Code

Light-Head R-CNN: In Defense of Two-Stage Object Detector

5 code implementations • 20 Nov 2017 • Zeming Li, Chao Peng, Gang Yu, Xiangyu Zhang, Yangdong Deng, Jian Sun

More importantly, simply replacing the backbone with a tiny network (e. g, Xception), our Light-Head R-CNN gets 30. 7 mmAP at 102 FPS on COCO, significantly outperforming the single-stage, fast detectors like YOLO and SSD on both speed and accuracy.

Vocal Bursts Valence Prediction

184

Paper
Code

Channel Pruning for Accelerating Very Deep Neural Networks

1 code implementation • ICCV 2017 • Yihui He, Xiangyu Zhang, Jian Sun

In this paper, we introduce a new channel pruning method to accelerate very deep convolutional neural networks. Given a trained CNN model, we propose an iterative two-step algorithm to effectively prune each layer, by a LASSO regression based channel selection and least square reconstruction.

regression

1,067

Paper
Code

ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices

37 code implementations • CVPR 2018 • Xiangyu Zhang, Xinyu Zhou, Mengxiao Lin, Jian Sun

We introduce an extremely computation-efficient CNN architecture named ShuffleNet, which is designed specially for mobile devices with very limited computing power (e. g., 10-150 MFLOPs).

Ranked #79 on Person Re-Identification on DukeMTMC-reID

General Classification Image Classification +2

6,295

Paper
Code

Large Kernel Matters -- Improve Semantic Segmentation by Global Convolutional Network

2 code implementations • CVPR 2017 • Chao Peng, Xiangyu Zhang, Gang Yu, Guiming Luo, Jian Sun

One of recent trends [30, 31, 14] in network architec- ture design is stacking small filters (e. g., 1x1 or 3x3) in the entire network because the stacked small filters is more ef- ficient than a large kernel, given the same computational complexity.

Ranked #8 on Semantic Segmentation on PASCAL VOC 2012 val

Semantic Segmentation

1,594

Paper
Code

Identity Mappings in Deep Residual Networks

55 code implementations • 16 Mar 2016 • Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun

Deep residual networks have emerged as a family of extremely deep architectures showing compelling accuracy and nice convergence behaviors.

Ranked #17 on Image Classification on Kuzushiji-MNIST

Image Classification

76,695

Paper
Code

Deep Residual Learning for Image Recognition

471 code implementations • CVPR 2016 • Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun

Deep residual nets are foundations of our submissions to ILSVRC & COCO 2015 competitions, where we also won the 1st places on the tasks of ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation.

Ranked #1 on Image Classification on cifar100

Domain Generalization +11

76,695

Paper
Code

Accelerating Very Deep Convolutional Networks for Classification and Detection

no code implementations • 26 May 2015 • Xiangyu Zhang, Jianhua Zou, Kaiming He, Jian Sun

This paper aims to accelerate the test-time computation of convolutional neural networks (CNNs), especially very deep CNNs that have substantially impacted the computer vision community.

Classification General Classification +3

Paper
Add Code

Object Detection Networks on Convolutional Feature Maps

no code implementations • 23 Apr 2015 • Shaoqing Ren, Kaiming He, Ross Girshick, Xiangyu Zhang, Jian Sun

We discover that aside from deep feature maps, a deep and convolutional per-region classifier is of particular importance for object detection, whereas latest superior image classification models (such as ResNets and GoogLeNets) do not directly lead to good detection accuracy without using such a per-region classifier.

General Classification Image Classification +3

Paper
Add Code

Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification

16 code implementations • ICCV 2015 • Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun

In this work, we study rectifier neural networks for image classification from two aspects.

General Classification Image Classification

171

Paper
Code

Efficient and Accurate Approximations of Nonlinear Convolutional Networks

no code implementations • CVPR 2015 • Xiangyu Zhang, Jianhua Zou, Xiang Ming, Kaiming He, Jian Sun

This paper aims to accelerate the test-time computation of deep convolutional neural networks (CNNs).

Paper
Add Code

Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition

14 code implementations • 18 Jun 2014 • Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun

This requirement is "artificial" and may reduce the recognition accuracy for the images or sub-images of an arbitrary size/scale.

Ranked #26 on Object Detection on PASCAL VOC 2007

General Classification Image Classification +3

394

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.