1 code implementation • NeurIPS 2023 • Meng Wei, Xiaoyu Yue, Wenwei Zhang, Shu Kong, Xihui Liu, Jiangmiao Pang
Secondly, part segmentation introduces an open granularity challenge due to the diverse and often ambiguous definitions of parts in the open world.
no code implementations • 3 Oct 2023 • Xiaoyu Yue, Lei Bai, Meng Wei, Jiangmiao Pang, Xihui Liu, Luping Zhou, Wanli Ouyang
Masked AutoEncoder (MAE) has revolutionized the field of self-supervised learning with its simple yet effective masking and reconstruction strategies.
no code implementations • 18 Jul 2023 • Meng Wei, Long Chen, Wei Ji, Xiaoyu Yue, Roger Zimmermann
While recent video-based methods utilizing video tubelets have shown promising results, we argue that the effective modeling of spatial and temporal context plays a more significant role than the choice between clip tubelets and video tubelets.
1 code implementation • 10 Dec 2021 • Meng Wei, Long Chen, Wei Ji, Xiaoyu Yue, Tat-Seng Chua
Since each verb is associated with a specific set of semantic roles, all existing GSR methods resort to a two-stage framework: predicting the verb in the first stage and detecting the semantic roles in the second stage.
Ranked #3 on Situation Recognition on imSitu
1 code implementation • 14 Aug 2021 • Zhanghui Kuang, Hongbin Sun, Zhizhong Li, Xiaoyu Yue, Tsui Hin Lin, Jianyong Chen, Huaqiang Wei, Yiqin Zhu, Tong Gao, Wenwei Zhang, Kai Chen, Wayne Zhang, Dahua Lin
We present MMOCR-an open-source toolbox which provides a comprehensive pipeline for text detection and recognition, as well as their downstream tasks such as named entity recognition and key information extraction.
1 code implementation • ICCV 2021 • Xiaoyu Yue, Shuyang Sun, Zhanghui Kuang, Meng Wei, Philip Torr, Wayne Zhang, Dahua Lin
As a typical example, the Vision Transformer (ViT) directly applies a pure transformer architecture on image classification, by simply splitting images into tokens with a fixed length, and employing transformers to learn relations between these tokens.
2 code implementations • 13 Jul 2021 • Shuyang Sun, Xiaoyu Yue, Song Bai, Philip Torr
To model the representations of the two levels, we first encode the information from the whole into part vectors through an attention mechanism, then decode the global information within the part vectors back into the whole representation.
Ranked #313 on Image Classification on ImageNet
2 code implementations • 26 Mar 2021 • Hongbin Sun, Zhanghui Kuang, Xiaoyu Yue, Chenhao Lin, Wayne Zhang
In order to roundly evaluate our proposed method as well as boost the future research, we release a new dataset named WildReceipt, which is collected and annotated tailored for the evaluation of key information extraction from document images of unseen templates in the wild.
no code implementations • ICCV 2021 • Shuyang Sun, Xiaoyu Yue, Xiaojuan Qi, Wanli Ouyang, Victor Adrian Prisacariu, Philip H.S. Torr
Aggregating features from different depths of a network is widely adopted to improve the network capability.
no code implementations • 12 Aug 2020 • Meng Wei, Chun Yuan, Xiaoyu Yue, Kuo Zhong
Second, since learning too many context-specific classification subspaces can suffer from data sparsity issues, we propose a hierarchical semantic aggregation(HSA) module to reduces the number of subspaces by introducing higher order structural information.
4 code implementations • ECCV 2020 • Xiaoyu Yue, Zhanghui Kuang, Chenhao Lin, Hongbin Sun, Wayne Zhang
Theoretically, our proposed method, dubbed \emph{RobustScanner}, decodes individual characters with dynamic ratio between context and positional clues, and utilizes more positional ones when the decoding sequences with scarce context, and thus is robust and practical.
1 code implementation • ICCV 2019 • Youjiang Xu, Jiaqi Duan, Zhanghui Kuang, Xiaoyu Yue, Hongbin Sun, Yue Guan, Wayne Zhang
Large geometry (e. g., orientation) variances are the key challenges in the scene text detection.
Ranked #10 on Scene Text Detection on ICDAR 2017 MLT
no code implementations • 10 May 2018 • Xiaoyu Yue, Zhanghui Kuang, Zhaoyang Zhang, Zhenfang Chen, Pan He, Yu Qiao, Wei zhang
Deep CNNs have achieved great success in text detection.