Search Results for author: Xiaoyu Yue

Found 13 papers, 8 papers with code

OV-PARTS: Towards Open-Vocabulary Part Segmentation

1 code implementation • NeurIPS 2023 • Meng Wei, Xiaoyu Yue, Wenwei Zhang, Shu Kong, Xihui Liu, Jiangmiao Pang

Secondly, part segmentation introduces an open granularity challenge due to the diverse and often ambiguous definitions of parts in the open world.

Open Vocabulary Semantic Segmentation Segmentation +1

Paper
Code

Understanding Masked Autoencoders From a Local Contrastive Perspective

no code implementations • 3 Oct 2023 • Xiaoyu Yue, Lei Bai, Meng Wei, Jiangmiao Pang, Xihui Liu, Luping Zhou, Wanli Ouyang

Masked AutoEncoder (MAE) has revolutionized the field of self-supervised learning with its simple yet effective masking and reconstruction strategies.

Contrastive Learning Data Augmentation +2

Paper
Add Code

In Defense of Clip-based Video Relation Detection

no code implementations • 18 Jul 2023 • Meng Wei, Long Chen, Wei Ji, Xiaoyu Yue, Roger Zimmermann

While recent video-based methods utilizing video tubelets have shown promising results, we argue that the effective modeling of spatial and temporal context plays a more significant role than the choice between clip tubelets and video tubelets.

Feature Compression Object Tracking +2

Paper
Add Code

Rethinking the Two-Stage Framework for Grounded Situation Recognition

1 code implementation • 10 Dec 2021 • Meng Wei, Long Chen, Wei Ji, Xiaoyu Yue, Tat-Seng Chua

Since each verb is associated with a specific set of semantic roles, all existing GSR methods resort to a two-stage framework: predicting the verb in the first stage and detecting the semantic roles in the second stage.

Ranked #3 on Situation Recognition on imSitu

Grounded Situation Recognition Object Recognition +1

Paper
Code

MMOCR: A Comprehensive Toolbox for Text Detection, Recognition and Understanding

1 code implementation • 14 Aug 2021 • Zhanghui Kuang, Hongbin Sun, Zhizhong Li, Xiaoyu Yue, Tsui Hin Lin, Jianyong Chen, Huaqiang Wei, Yiqin Zhu, Tong Gao, Wenwei Zhang, Kai Chen, Wayne Zhang, Dahua Lin

We present MMOCR-an open-source toolbox which provides a comprehensive pipeline for text detection and recognition, as well as their downstream tasks such as named entity recognition and key information extraction.

Key Information Extraction named-entity-recognition +4

4,099

Paper
Code

Vision Transformer with Progressive Sampling

1 code implementation • ICCV 2021 • Xiaoyu Yue, Shuyang Sun, Zhanghui Kuang, Meng Wei, Philip Torr, Wayne Zhang, Dahua Lin

As a typical example, the Vision Transformer (ViT) directly applies a pure transformer architecture on image classification, by simply splitting images into tokens with a fixed length, and employing transformers to learn relations between these tokens.

Image Classification

147

Paper
Code

Visual Parser: Representing Part-whole Hierarchies with Transformers

2 code implementations • 13 Jul 2021 • Shuyang Sun, Xiaoyu Yue, Song Bai, Philip Torr

To model the representations of the two levels, we first encode the information from the whole into part vectors through an attention mechanism, then decode the global information within the part vectors back into the whole representation.

Ranked #313 on Image Classification on ImageNet

Decoder Image Classification +4

124

Paper
Code

Spatial Dual-Modality Graph Reasoning for Key Information Extraction

2 code implementations • 26 Mar 2021 • Hongbin Sun, Zhanghui Kuang, Xiaoyu Yue, Chenhao Lin, Wayne Zhang

In order to roundly evaluate our proposed method as well as boost the future research, we release a new dataset named WildReceipt, which is collected and annotated tailored for the evaluation of key information extraction from document images of unseen templates in the wild.

Key Information Extraction Template Matching

38,826

Paper
Code

Aggregation With Feature Detection

no code implementations • ICCV 2021 • Shuyang Sun, Xiaoyu Yue, Xiaojuan Qi, Wanli Ouyang, Victor Adrian Prisacariu, Philip H.S. Torr

Aggregating features from different depths of a network is widely adopted to improve the network capability.

Instance Segmentation object-detection +2

Paper
Add Code

HOSE-Net: Higher Order Structure Embedded Network for Scene Graph Generation

no code implementations • 12 Aug 2020 • Meng Wei, Chun Yuan, Xiaoyu Yue, Kuo Zhong

Second, since learning too many context-specific classification subspaces can suffer from data sparsity issues, we propose a hierarchical semantic aggregation(HSA) module to reduces the number of subspaces by introducing higher order structural information.

General Classification Graph Generation +5

Paper
Add Code

RobustScanner: Dynamically Enhancing Positional Clues for Robust Text Recognition

4 code implementations • ECCV 2020 • Xiaoyu Yue, Zhanghui Kuang, Chenhao Lin, Hongbin Sun, Wayne Zhang

Theoretically, our proposed method, dubbed \emph{RobustScanner}, decodes individual characters with dynamic ratio between context and positional clues, and utilizes more positional ones when the decoding sequences with scarce context, and thus is robust and practical.

Decoder Irregular Text Recognition +2