Search Results for author: Zheng Ge

Found 31 papers, 18 papers with code

Self-Supervised Visual Preference Alignment

1 code implementation • 16 Apr 2024 • Ke Zhu, Liang Zhao, Zheng Ge, Xiangyu Zhang

We generate chosen and rejected responses with regard to the original and augmented image pairs, and conduct preference alignment with direct preference optimization.

Ranked #34 on Visual Question Answering on MM-Vet

8k Visual Question Answering

Paper
Code

OneChart: Purify the Chart Structural Extraction via One Auxiliary Token

1 code implementation • 15 Apr 2024 • Jinyue Chen, Lingyu Kong, Haoran Wei, Chenglong Liu, Zheng Ge, Liang Zhao, Jianjian Sun, Chunrui Han, Xiangyu Zhang

To address this, we propose OneChart: a reliable agent specifically devised for the structural extraction of chart information.

Decoder

Paper
Code

ShapeLLM: Universal 3D Object Understanding for Embodied Interaction

3 code implementations • 27 Feb 2024 • Zekun Qi, Runpei Dong, Shaochen Zhang, Haoran Geng, Chunrui Han, Zheng Ge, He Wang, Li Yi, Kaisheng Ma

This paper presents ShapeLLM, the first 3D Multimodal Large Language Model (LLM) designed for embodied interaction, exploring a universal 3D object understanding with 3D point clouds and languages.

Ranked #1 on 3D Point Cloud Linear Classification on ModelNet40

3D Object Captioning 3D Point Cloud Linear Classification +10

112

Paper
Code

Small Language Model Meets with Reinforced Vision Vocabulary

no code implementations • 23 Jan 2024 • Haoran Wei, Lingyu Kong, Jinyue Chen, Liang Zhao, Zheng Ge, En Yu, Jianjian Sun, Chunrui Han, Xiangyu Zhang

In Vary-toy, we introduce an improved vision vocabulary, allowing the model to not only possess all features of Vary but also gather more generality.

Ranked #81 on Visual Question Answering on MM-Vet

Language Modelling Large Language Model +3

Paper
Add Code

Vary: Scaling up the Vision Vocabulary for Large Vision-Language Models

1 code implementation • 11 Dec 2023 • Haoran Wei, Lingyu Kong, Jinyue Chen, Liang Zhao, Zheng Ge, Jinrong Yang, Jianjian Sun, Chunrui Han, Xiangyu Zhang

Accordingly, we propose Vary, an efficient and effective method to scale up the vision vocabulary of LVLMs.

Ranked #56 on Visual Question Answering on MM-Vet

Decoder Optical Character Recognition (OCR) +1

1,580

Paper
Code

Merlin:Empowering Multimodal LLMs with Foresight Minds

no code implementations • 30 Nov 2023 • En Yu, Liang Zhao, Yana Wei, Jinrong Yang, Dongming Wu, Lingyu Kong, Haoran Wei, Tiancai Wang, Zheng Ge, Xiangyu Zhang, Wenbing Tao

Then, FIT requires MLLMs to first predict trajectories of related objects and then reason about potential future events based on them.

Ranked #66 on Visual Question Answering on MM-Vet

Visual Question Answering

Paper
Add Code

DreamLLM: Synergistic Multimodal Comprehension and Creation

1 code implementation • 20 Sep 2023 • Runpei Dong, Chunrui Han, Yuang Peng, Zekun Qi, Zheng Ge, Jinrong Yang, Liang Zhao, Jianjian Sun, HongYu Zhou, Haoran Wei, Xiangwen Kong, Xiangyu Zhang, Kaisheng Ma, Li Yi

This paper presents DreamLLM, a learning framework that first achieves versatile Multimodal Large Language Models (MLLMs) empowered with frequently overlooked synergy between multimodal comprehension and creation.

Ranked #2 on Visual Question Answering on MMBench (GPT-3.5 score metric)

multimodal generation Visual Question Answering +2

318

Paper
Code

GroupLane: End-to-End 3D Lane Detection with Channel-wise Grouping

no code implementations • 18 Jul 2023 • Zhuoling Li, Chunrui Han, Zheng Ge, Jinrong Yang, En Yu, Haoqian Wang, Hengshuang Zhao, Xiangyu Zhang

Besides, GroupLane with ResNet18 still surpasses PersFormer by 4. 9% F1 score, while the inference speed is nearly 7x faster and the FLOPs is only 13. 3% of it.

3D Lane Detection

Paper
Add Code

ChatSpot: Bootstrapping Multimodal LLMs via Precise Referring Instruction Tuning

no code implementations • 18 Jul 2023 • Liang Zhao, En Yu, Zheng Ge, Jinrong Yang, Haoran Wei, HongYu Zhou, Jianjian Sun, Yuang Peng, Runpei Dong, Chunrui Han, Xiangyu Zhang

Based on precise referring instruction, we propose ChatSpot, a unified end-to-end multimodal large language model that supports diverse forms of interactivity including mouse clicks, drag-and-drop, and drawing boxes, which provides a more flexible and seamless interactive experience.

Instruction Following Language Modelling +1

Paper
Add Code

GMM: Delving into Gradient Aware and Model Perceive Depth Mining for Monocular 3D Detection

no code implementations • 30 Jun 2023 • Weixin Mao, Jinrong Yang, Zheng Ge, Lin Song, HongYu Zhou, Tiezheng Mao, Zeming Li, Osamu Yoshie

In light of the success of sample mining techniques in 2D object detection, we propose a simple yet effective mining strategy for improving depth perception in 3D object detection.

3D Object Detection Depth Estimation +3

Paper
Add Code

The 1st-place Solution for CVPR 2023 OpenLane Topology in Autonomous Driving Challenge

1 code implementation • 16 Jun 2023 • Dongming Wu, Fan Jia, Jiahao Chang, Zhuoling Li, Jianjian Sun, Chunrui Han, Shuailin Li, Yingfei Liu, Zheng Ge, Tiancai Wang

We present the 1st-place solution of OpenLane Topology in Autonomous Driving Challenge.

Autonomous Driving

120

Paper
Code

Align-DETR: Improving DETR with Simple IoU-aware BCE loss

1 code implementation • 15 Apr 2023 • Zhi Cai, Songtao Liu, Guodong Wang, Zheng Ge, Xiangyu Zhang, Di Huang

We propose a metric, recall of best-regressed samples, to quantitively evaluate the misalignment problem.

object-detection Object Detection

Paper
Code

BEVStereo++: Accurate Depth Estimation in Multi-view 3D Object Detection via Dynamic Temporal Stereo

no code implementations • 9 Apr 2023 • Yinhao Li, Jinrong Yang, Jianjian Sun, Han Bao, Zheng Ge, Li Xiao

Bounded by the inherent ambiguity of depth perception, contemporary multi-view 3D object detection methods fall into the performance bottleneck.

3D Object Detection Depth Estimation +2

Paper
Add Code

Exploring Recurrent Long-term Temporal Fusion for Multi-view 3D Perception

no code implementations • 10 Mar 2023 • Chunrui Han, Jinrong Yang, Jianjian Sun, Zheng Ge, Runpei Dong, HongYu Zhou, Weixin Mao, Yuang Peng, Xiangyu Zhang

In this paper, we explore an embarrassingly simple long-term recurrent fusion strategy built upon the LSS-based methods and find it already able to enjoy the merits from both sides, i. e., rich long-term information and efficient fusion pipeline.

motion prediction object-detection +1

Paper
Add Code

Contrast with Reconstruct: Contrastive 3D Representation Learning Guided by Generative Pretraining

3 code implementations • 5 Feb 2023 • Zekun Qi, Runpei Dong, Guofan Fan, Zheng Ge, Xiangyu Zhang, Kaisheng Ma, Li Yi

This motivates us to learn 3D representations by sharing the merits of both paradigms, which is non-trivial due to the pattern difference between the two paradigms.

Ranked #1 on Zero-Shot Transfer 3D Point Cloud Classification on ModelNet10 (using extra training data)

3D Point Cloud Linear Classification Decoder +3

112

Paper
Code

Autoencoders as Cross-Modal Teachers: Can Pretrained 2D Image Transformers Help 3D Representation Learning?

3 code implementations • 16 Dec 2022 • Runpei Dong, Zekun Qi, Linfeng Zhang, Junbo Zhang, Jianjian Sun, Zheng Ge, Li Yi, Kaisheng Ma

The success of deep learning heavily relies on large-scale data with comprehensive labels, which is more expensive and time-consuming to fetch in 3D compared to 2D images or natural languages.

Ranked #5 on Few-Shot 3D Point Cloud Classification on ModelNet40 10-way (10-shot) (using extra training data)

Few-Shot 3D Point Cloud Classification Knowledge Distillation +1

112

Paper
Code

MatrixVT: Efficient Multi-Camera to BEV Transformation for 3D Perception

2 code implementations • ICCV 2023 • HongYu Zhou, Zheng Ge, Zeming Li, Xiangyu Zhang

This paper proposes an efficient multi-camera to Bird's-Eye-View (BEV) view transformation method for 3D perception, dubbed MatrixVT.

Ranked #2 on Bird's-Eye View Semantic Segmentation on nuScenes (IoU lane - 224x480 - 100x100 at 0.5 metric)

Autonomous Driving Bird's-Eye View Semantic Segmentation +2

666

Paper
Code

Towards 3D Object Detection with 2D Supervision

no code implementations • 15 Nov 2022 • Jinrong Yang, Tiancai Wang, Zheng Ge, Weixin Mao, Xiaoping Li, Xiangyu Zhang

We propose a temporal 2D transformation to bridge the 3D predictions with temporal 2D labels.

3D Object Detection Object +1

Paper
Add Code

Implicit Identity Leakage: The Stumbling Block to Improving Deepfake Detection Generalization

1 code implementation • CVPR 2023 • Shichao Dong, Jin Wang, Renhe Ji, Jiajun Liang, Haoqiang Fan, Zheng Ge

In this paper, we analyse the generalization ability of binary classifiers for the task of deepfake detection.

DeepFake Detection Face Swapping

119

Paper
Code

BEVStereo: Enhancing Depth Estimation in Multi-view 3D Object Detection with Dynamic Temporal Stereo

3 code implementations • 21 Sep 2022 • Yinhao Li, Han Bao, Zheng Ge, Jinrong Yang, Jianjian Sun, Zeming Li

To this end, we introduce an effective temporal stereo method to dynamically select the scale of matching candidates, enable to significantly reduce computation overhead.

Ranked #11 on 3D Object Detection on nuScenes Camera Only

3D Object Detection Depth Estimation +1

666

Paper
Code

STS: Surround-view Temporal Stereo for Multi-view 3D Detection

no code implementations • 22 Aug 2022 • Zengran Wang, Chen Min, Zheng Ge, Yinhao Li, Zeming Li, Hongyu Yang, Di Huang

Instead of using a sole monocular depth method, in this work, we propose a novel Surround-view Temporal Stereo (STS) technique that leverages the geometry correspondence between frames across time to facilitate accurate depth learning.

3D Object Detection Depth Estimation +4

Paper
Add Code

PersDet: Monocular 3D Detection in Perspective Bird's-Eye-View

no code implementations • 19 Aug 2022 • HongYu Zhou, Zheng Ge, Weixin Mao, Zeming Li

To address this problem, we revisit the generation of BEV representation and propose detecting objects in perspective BEV -- a new BEV representation that does not require feature sampling.

Autonomous Driving object-detection +1

Paper
Add Code

Dense Teacher: Dense Pseudo-Labels for Semi-supervised Object Detection

2 code implementations • 6 Jul 2022 • HongYu Zhou, Zheng Ge, Songtao Liu, Weixin Mao, Zeming Li, Haiyan Yu, Jian Sun

To date, the most powerful semi-supervised object detectors (SS-OD) are based on pseudo-boxes, which need a sequence of post-processing with fine-tuned hyper-parameters.

Ranked #4 on Semi-Supervised Object Detection on COCO 100% labeled data

object-detection Object Detection +2

12,146

Paper
Code

BEVDepth: Acquisition of Reliable Depth for Multi-view 3D Object Detection

2 code implementations • 21 Jun 2022 • Yinhao Li, Zheng Ge, Guanyi Yu, Jinrong Yang, Zengran Wang, Yukang Shi, Jianjian Sun, Zeming Li

In this research, we propose a new 3D object detector with a trustworthy depth estimation, dubbed BEVDepth, for camera-based Bird's-Eye-View (BEV) 3D object detection.

Ranked #4 on 3D Object Detection on Rope3D

3D Object Detection Depth Estimation +1

666

Paper
Code

Workshop on Autonomous Driving at CVPR 2021: Technical Report for Streaming Perception Challenge

1 code implementation • 27 Jul 2021 • Songyang Zhang, Lin Song, Songtao Liu, Zheng Ge, Zeming Li, Xuming He, Jian Sun

In this report, we introduce our real-time 2D object detection system for the realistic autonomous driving scenario.

Autonomous Driving object-detection +1

9,049

Paper
Code

YOLOX: Exceeding YOLO Series in 2021

41 code implementations • 18 Jul 2021 • Zheng Ge, Songtao Liu, Feng Wang, Zeming Li, Jian Sun

In this report, we present some experienced improvements to YOLO series, forming a new high-performance detector -- YOLOX.

Ranked #1 on Real-Time Object Detection on Argoverse-HD (Detection-Only, Val) (using extra training data)

Autonomous Driving Real-Time Object Detection

27,976

Paper
Code

OTA: Optimal Transport Assignment for Object Detection

2 code implementations • CVPR 2021 • Zheng Ge, Songtao Liu, Zeming Li, Osamu Yoshie, Jian Sun

Recent advances in label assignment in object detection mainly seek to independently define positive/negative training samples for each ground-truth (gt) object.

Ranked #73 on Object Detection on COCO test-dev

Object object-detection +1

242

Paper
Code

LLA: Loss-aware Label Assignment for Dense Pedestrian Detection

1 code implementation • 12 Jan 2021 • Zheng Ge, JianFeng Wang, Xin Huang, Songtao Liu, Osamu Yoshie

A joint loss is then defined as the weighted summation of cls and reg losses as the assigning indicator.

object-detection Object Detection +1

Paper
Code

Delving into the Imbalance of Positive Proposals in Two-stage Object Detection

no code implementations • 23 May 2020 • Zheng Ge, Zequn Jie, Xin Huang, Chengzheng Li, Osamu Yoshie

The first imbalance lies in the large number of low-quality RPN proposals, which makes the R-CNN module (i. e., post-classification layers) become highly biased towards the negative proposals in the early training stage.

object-detection Object Detection

Paper
Add Code

NMS by Representative Region: Towards Crowded Pedestrian Detection by Proposal Pairing

no code implementations • CVPR 2020 • Xin Huang, Zheng Ge, Zequn Jie, Osamu Yoshie

To acquire the visible parts, a novel Paired-Box Model (PBM) is proposed to simultaneously predict the full and visible boxes of a pedestrian.

Pedestrian Detection

Paper
Add Code

PS-RCNN: Detecting Secondary Human Instances in a Crowd via Primary Object Suppression

no code implementations • 16 Mar 2020 • Zheng Ge, Zequn Jie, Xin Huang, Rong Xu, Osamu Yoshie

PS-RCNN first detects slightly/none occluded objects by an R-CNN module (referred as P-RCNN), and then suppress the detected instances by human-shaped masks so that the features of heavily occluded instances can stand out.

Ranked #2 on Object Detection on WiderPerson

Human Detection Object Detection

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.