3 code implementations • 8 Apr 2024 • Chuwei Luo, Yufan Shen, Zhaoqing Zhu, Qi Zheng, Zhi Yu, Cong Yao
The core of LayoutLLM is a layout instruction tuning strategy, which is specially designed to enhance the comprehension and utilization of document layouts.
1 code implementation • 28 Mar 2024 • Jianqiang Wan, Sibo Song, Wenwen Yu, Yuliang Liu, Wenqing Cheng, Fei Huang, Xiang Bai, Cong Yao, Zhibo Yang
Recently, visually-situated text parsing (VsTP) has experienced notable advancements, driven by the increasing demand for automated document understanding and the emergence of Generative Large Language Models (LLMs) capable of processing document-based questions.
no code implementations • 20 Mar 2024 • Yuyi Zhang, Yuanzhi Zhu, Dezhi Peng, Peirong Zhang, Zhenhua Yang, Zhibo Yang, Cong Yao, Lianwen Jin
Text recognition, especially for complex scripts like Chinese, faces unique challenges due to its intricate character structures and vast vocabulary.
no code implementations • 3 Jan 2024 • Rujiao Long, Hangdi Xing, Zhibo Yang, Qi Zheng, Zhi Yu, Cong Yao, Fei Huang
We model TSR as a logical location regression problem and propose a new TSR framework called LORE, standing for LOgical location REgression network, which for the first time regresses logical location as well as spatial location of table cells in a unified network.
1 code implementation • 19 Dec 2023 • Zhenhua Yang, Dezhi Peng, Yuxin Kong, Yuyi Zhang, Cong Yao, Lianwen Jin
Automatic font generation is an imitation task, which aims to create a font library that mimics the style of reference images while preserving the content from source images.
1 code implementation • 19 Oct 2023 • Cong Yao
In this report, we introduce DocXChain, a powerful open-source toolchain for document parsing, which is designed and developed to automatically convert the rich information embodied in unstructured documents, such as text, tables and charts, into structured representations that are readable and manipulable by machines.
1 code implementation • ICCV 2023 • Cheng Da, Chuwei Luo, Qi Zheng, Cong Yao
Document pre-trained models and grid-based models have proven to be very effective on various tasks in Document AI.
Ranked #1 on Document Layout Analysis on PubLayNet val
1 code implementation • ICCV 2023 • Changxu Cheng, Peng Wang, Cheng Da, Qi Zheng, Cong Yao
The diversity in length constitutes a significant characteristic of text.
1 code implementation • 25 Jul 2023 • Cheng Da, Peng Wang, Cong Yao
Specifically, MGP-STR achieves an average recognition accuracy of $94\%$ on standard benchmarks for scene text recognition.
no code implementations • CVPR 2023 • Yuanzhi Zhu, Zhaohai Li, Tianwei Wang, Mengchao He, Cong Yao
Current text recognition systems, including those for handwritten scripts and scene text, have relied heavily on image synthesis and augmentation, since it is difficult to realize real-world complexity and diversity through collecting and annotating enough real text images.
1 code implementation • CVPR 2023 • Chuwei Luo, Changxu Cheng, Qi Zheng, Cong Yao
Additionally, novel relation heads, which are pre-trained by the geometric pre-training tasks and fine-tuned for RE, are elaborately designed to enrich and enhance the feature representation.
Ranked #1 on Key Information Extraction on CORD
no code implementations • CVPR 2023 • Zhibo Yang, Rujiao Long, Pengfei Wang, Sibo Song, Humen Zhong, Wenqing Cheng, Xiang Bai, Cong Yao
As the first contribution of this work, we curate and release a new dataset for VIE, in which the document images are much more challenging in that they are taken from real applications, and difficulties such as blur, partial occlusion, and printing shift are quite common.
1 code implementation • 7 Mar 2023 • Hangdi Xing, Feiyu Gao, Rujiao Long, Jiajun Bu, Qi Zheng, Liangcheng Li, Cong Yao, Zhi Yu
Table structure recognition (TSR) aims at extracting tables in images into machine-understandable formats.
2 code implementations • 8 Sep 2022 • Cheng Da, Peng Wang, Cong Yao
A novel scene text recognizer based on Vision-Language Transformer (VLT) is presented.
2 code implementations • 8 Sep 2022 • Peng Wang, Cheng Da, Cong Yao
In this work, we first draw inspiration from the recent progress in Vision Transformer (ViT) to construct a conceptually simple yet powerful vision STR model, which is built upon ViT and outperforms previous state-of-the-art models for scene text recognition, including both pure vision models and language-augmented methods.
Ranked #2 on Scene Text Recognition on Uber-Text (using extra training data)
no code implementations • 27 Jun 2022 • Chuwei Luo, Guozhi Tang, Qi Zheng, Cong Yao, Lianwen Jin, Chenliang Li, Yang Xue, Luo Si
Multi-modal document pre-trained models have proven to be very effective in a variety of visually-rich document understanding (VrDU) tasks.
2 code implementations • CVPR 2022 • Sibo Song, Jianqiang Wan, Zhibo Yang, Jun Tang, Wenqing Cheng, Xiang Bai, Cong Yao
In this paper, we specifically adapt vision-language joint learning for scene text detection, a task that intrinsically involves cross-modal interaction between the two modalities: vision and language, since text is the written form of language.
no code implementations • CVPR 2022 • Xiangwei Jiang, Rujiao Long, Nan Xue, Zhibo Yang, Cong Yao, Gui-Song Xia
This paper addresses the problem of document image dewarping, which aims at eliminating the geometric distortion in document images for document digitization.
Ranked #3 on Local Distortion on DocUNet
5 code implementations • 21 Feb 2022 • Minghui Liao, Zhisheng Zou, Zhaoyi Wan, Cong Yao, Xiang Bai
By incorporating the proposed DB and ASF with the segmentation network, our proposed scene text detector consistently achieves state-of-the-art results, in terms of both detection accuracy and speed, on five standard benchmarks.
Ranked #3 on Scene Text Detection on MSRA-TD500
no code implementations • 20 Oct 2021 • Humen Zhong, Jun Tang, Wenhai Wang, Zhibo Yang, Cong Yao, Tong Lu
Recent approaches for end-to-end text spotting have achieved promising results.
no code implementations • 7 Apr 2021 • Zhaoyi Wan, Haoran Chen, Jielei Zhang, Wentao Jiang, Cong Yao, Jiebo Luo
In this paper, we address the problem of makeup transfer, which aims at transplanting the makeup from the reference face to the source face while preserving the identity of the source.
no code implementations • CVPR 2021 • Minghang He, Minghui Liao, Zhibo Yang, Humen Zhong, Jun Tang, Wenqing Cheng, Cong Yao, Yongpan Wang, Xiang Bai
Over the past few years, the field of scene text detection has progressed rapidly that modern text detectors are able to hunt text in various challenging scenarios.
1 code implementation • 17 Nov 2020 • Zhaoyi Wan, Yimin Chen, Sutao Deng, Kunpeng Chen, Cong Yao, Jiebo Luo
In this paper, we are concerned with the detection of a particular type of objects with extreme aspect ratios, namely \textbf{slender objects}.
Ranked #1 on Object Detection on COCO+
no code implementations • ECCV 2020 • Yushuo Guan, Pengyu Zhao, Bingxuan Wang, Yuanxing Zhang, Cong Yao, Kaigui Bian, Jian Tang
To tackle with both the efficiency and the effectiveness of knowledge distillation, we introduce the feature aggregation to imitate the multi-teacher distillation in the single-teacher distillation framework by extracting informative supervision from multiple teacher feature maps.
no code implementations • CVPR 2020 • Zhaoyi Wan, Jielei Zhang, Liang Zhang, Jiebo Luo, Cong Yao
This remedy alleviates the problem of vocabulary reliance and improves the overall scene text recognition performance.
3 code implementations • CVPR 2020 • Shangbang Long, Cong Yao
Synthetic data has been a critical tool for training scene text detection and recognition models.
no code implementations • 10 Feb 2020 • Shangbang Long, Yushuo Guan, Kaigui Bian, Cong Yao
Irregular scene text recognition has attracted much attention from the research community, mainly due to the complexity of shapes of text in natural scene.
no code implementations • 28 Dec 2019 • Zhaoyi Wan, Minghang He, Haoran Chen, Xiang Bai, Cong Yao
Driven by deep learning and the large volume of data, scene text recognition has evolved rapidly in recent years.
Ranked #19 on Scene Text Recognition on ICDAR2015
15 code implementations • 20 Nov 2019 • Minghui Liao, Zhaoyi Wan, Cong Yao, Kai Chen, Xiang Bai
Recently, segmentation-based methods are quite popular in scene text detection, as the segmentation results can more accurately describe scene text of various shapes such as curve text.
Ranked #6 on Scene Text Detection on MSRA-TD500
1 code implementation • 30 Aug 2019 • Shangbang Long, Yushuo Guan, Bingxuan Wang, Kaigui Bian, Cong Yao
Reading text from natural images is challenging due to the great variety in text font, color, size, complex background and etc..
1 code implementation • ECCV 2018 • Minghui Liao, Pengyuan Lyu, Minghang He, Cong Yao, Wenhao Wu, Xiang Bai
Moreover, we further investigate the recognition module of our method separately, which significantly outperforms state-of-the-art methods on both regular and irregular text datasets for scene text recognition.
no code implementations • ICCV 2019 • MingKun Yang, Yushuo Guan, Minghui Liao, Xin He, Kaigui Bian, Song Bai, Cong Yao, Xiang Bai
Reading text in the wild is a very challenging task due to the diversity of text instances and the complexity of natural scenes.
no code implementations • 23 Jul 2019 • Zhaoyi Wan, Fengming Xie, Yibo Liu, Xiang Bai, Cong Yao
Scene text recognition has been an important, active research topic in computer vision for years.
1 code implementation • 13 Jul 2019 • Minghui Liao, Boyu Song, Shangbang Long, Minghang He, Cong Yao, Xiang Bai
Different from the previous methods which paste the rendered text on static 2D images, our method can render the 3D virtual scene and text instances as an entirety.
2 code implementations • 21 Nov 2018 • Enze Xie, Yuhang Zang, Shuai Shao, Gang Yu, Cong Yao, Guangyao Li
We propose a supervised pyramid context network (SPCNET) to precisely locate text regions while suppressing false positives.
Ranked #2 on Scene Text Detection on ICDAR 2013
1 code implementation • 10 Nov 2018 • Shangbang Long, Xin He, Cong Yao
As an important research area in computer vision, scene text detection and recognition has been inescapably influenced by this wave of revolution, consequentially entering the era of deep learning.
no code implementations • 18 Sep 2018 • Minghui Liao, Jian Zhang, Zhaoyi Wan, Fengming Xie, Jiajun Liang, Pengyuan Lyu, Cong Yao, Xiang Bai
Inspired by speech recognition, recent state-of-the-art algorithms mostly consider scene text recognition as a sequence prediction problem.
Ranked #32 on Scene Text Recognition on SVT
1 code implementation • ECCV 2018 • Pengyuan Lyu, Minghui Liao, Cong Yao, Wenhao Wu, Xiang Bai
Recently, models based on deep neural networks have dominated the fields of scene text detection and recognition.
Ranked #3 on Scene Text Detection on ICDAR 2013
3 code implementations • ECCV 2018 • Shangbang Long, Jiaqiang Ruan, Wenjie Zhang, Xin He, Wenhao Wu, Cong Yao
Driven by deep neural networks and large scale datasets, scene text detection methods have progressed substantially over the past years, continuously refreshing the performance records on various standard benchmarks.
Ranked #2 on Curved Text Detection on SCUT-CTW1500
3 code implementations • good 2018 • Baoguang Shi, Mingkun Yang, Xinggang Wang, Pengyuan Lyu, Cong Yao, and Xiang Bai
SCENE text recognition has attracted great interest from the academia and the industry in recent years owing to its importance in a wide range of applications.
Ranked #22 on Scene Text Recognition on ICDAR2015
Optical Character Recognition Optical Character Recognition (OCR) +1
1 code implementation • CVPR 2018 • Pengyuan Lyu, Cong Yao, Wenhao Wu, Shuicheng Yan, Xiang Bai
We propose to detect scene text by localizing corner points of text bounding boxes and segmenting text regions in relative positions.
Ranked #2 on Scene Text Detection on ICDAR 2017 MLT
5 code implementations • 31 Aug 2017 • Baoguang Shi, Cong Yao, Minghui Liao, Mingkun Yang, Pei Xu, Linyan Cui, Serge Belongie, Shijian Lu, Xiang Bai
This report introduces RCTW, a new competition that focuses on Chinese text reading.
no code implementations • 27 Jun 2017 • Pengyuan Lyu, Xiang Bai, Cong Yao, Zhen Zhu, Tengteng Huang, Wenyu Liu
In this paper, we investigate the Chinese calligraphy synthesis problem: synthesizing Chinese calligraphy images with specified style from standard font(eg.
no code implementations • 12 Jun 2017 • Xinggang Wang, Kaibing Chen, Zilong Huang, Cong Yao, Wenyu Liu
The deep ConvNets based object detectors mainly focus on regressing the coordinates of bounding box, e. g., Faster-R-CNN, YOLO and SSD.
32 code implementations • CVPR 2017 • Xinyu Zhou, Cong Yao, He Wen, Yuzhi Wang, Shuchang Zhou, Weiran He, Jiajun Liang
Previous approaches for scene text detection have already achieved promising performances across various benchmarks.
Ranked #3 on Scene Text Detection on COCO-Text
Curved Text Detection Optical Character Recognition (OCR) +1
no code implementations • 1 Dec 2016 • He Wen, Shuchang Zhou, Zhe Liang, Yuxiang Zhang, Dieqiao Feng, Xinyu Zhou, Cong Yao
Fully convolutional neural networks give accurate, per-pixel prediction for input images and have applications like semantic segmentation.
2 code implementations • 30 Nov 2016 • Qinyao He, He Wen, Shuchang Zhou, Yuxin Wu, Cong Yao, Xinyu Zhou, Yuheng Zou
In addition, we propose balanced quantization methods for weights to further reduce performance degradation.
no code implementations • 29 Jun 2016 • Cong Yao, Xiang Bai, Nong Sang, Xinyu Zhou, Shuchang Zhou, Zhimin Cao
Recently, scene text detection has become an active research topic in computer vision and document analysis, because of its great importance and significant challenge.
Ranked #6 on Scene Text Detection on COCO-Text
1 code implementation • CVPR 2016 • Zheng Zhang, Chengquan Zhang, Wei Shen, Cong Yao, Wenyu Liu, Xiang Bai
In this paper, we propose a novel approach for text detec- tion in natural images.
Ranked #40 on Scene Text Detection on ICDAR 2015
5 code implementations • CVPR 2016 • Baoguang Shi, Xinggang Wang, Pengyuan Lyu, Cong Yao, Xiang Bai
We show that the model is able to recognize several types of irregular text, including perspective text and curved text.
Ranked #10 on Scene Text Recognition on ICDAR 2003
no code implementations • 30 Nov 2015 • Cong Yao, Jia-Nan Wu, Xinyu Zhou, Chi Zhang, Shuchang Zhou, Zhimin Cao, Qi Yin
Different from focused texts present in natural images, which are captured with user's intention and intervention, incidental texts usually exhibit much more diversity, variability and complexity, thus posing significant difficulties and challenges for scene text detection and recognition algorithms.
no code implementations • ICCV 2015 • Xinggang Wang, Zhuotun Zhu, Cong Yao, Xiang Bai
Multiple-instance learning (MIL) has served as an important tool for a wide range of vision applications, for instance, image classification, object detection, and visual tracking.
83 code implementations • 21 Jul 2015 • Baoguang Shi, Xiang Bai, Cong Yao
In this paper, we investigate the problem of scene text recognition, which is among the most important and challenging tasks in image-based sequence recognition.
Ranked #12 on Scene Text Recognition on ICDAR 2003
no code implementations • 10 Jun 2015 • Xinyu Zhou, Shuchang Zhou, Cong Yao, Zhimin Cao, Qi Yin
Recently, text detection and recognition in natural scenes are becoming increasing popular in the computer vision community as well as the document analysis community.
no code implementations • CVPR 2015 • Zheng Zhang, Wei Shen, Cong Yao, Xiang Bai
Recently, a variety of real-world applications have triggered huge demand for techniques that can extract textual information from natural scenes.
no code implementations • 12 May 2015 • Baoguang Shi, Cong Yao, Chengquan Zhang, Xiaowei Guo, Feiyue Huang, Xiang Bai
With the rapid increase of transnational communication and cooperation, people frequently encounter multilingual scenarios in various situations.
no code implementations • 25 Sep 2014 • Zhuotun Zhu, Xinggang Wang, Song Bai, Cong Yao, Xiang Bai
By combing the global deep learning representation and the local descriptor representation, our method can obtain the state-of-the-art performance on 3D shape retrieval benchmarks.
no code implementations • CVPR 2014 • Cong Yao, Xiang Bai, Baoguang Shi, Wenyu Liu
Driven by the wide range of applications, scene text detection and recognition have become active research topics in computer vision.