Search Results for author: Cong Yao

Found 58 papers, 31 papers with code

LayoutLLM: Layout Instruction Tuning with Large Language Models for Document Understanding

3 code implementations • 8 Apr 2024 • Chuwei Luo, Yufan Shen, Zhaoqing Zhu, Qi Zheng, Zhi Yu, Cong Yao

The core of LayoutLLM is a layout instruction tuning strategy, which is specially designed to enhance the comprehension and utilization of document layouts.

document understanding

997

Paper
Code

OmniParser: A Unified Framework for Text Spotting, Key Information Extraction and Table Recognition

1 code implementation • 28 Mar 2024 • Jianqiang Wan, Sibo Song, Wenwen Yu, Yuliang Liu, Wenqing Cheng, Fei Huang, Xiang Bai, Cong Yao, Zhibo Yang

Recently, visually-situated text parsing (VsTP) has experienced notable advancements, driven by the increasing demand for automated document understanding and the emergence of Generative Large Language Models (LLMs) capable of processing document-based questions.

Decoder document understanding +4

997

Paper
Code

HierCode: A Lightweight Hierarchical Codebook for Zero-shot Chinese Text Recognition

no code implementations • 20 Mar 2024 • Yuyi Zhang, Yuanzhi Zhu, Dezhi Peng, Peirong Zhang, Zhenhua Yang, Zhibo Yang, Cong Yao, Lianwen Jin

Text recognition, especially for complex scripts like Chinese, faces unique challenges due to its intricate character structures and vast vocabulary.

Zero-Shot Learning

Paper
Add Code

LORE++: Logical Location Regression Network for Table Structure Recognition with Pre-training

no code implementations • 3 Jan 2024 • Rujiao Long, Hangdi Xing, Zhibo Yang, Qi Zheng, Zhi Yu, Cong Yao, Fei Huang

We model TSR as a logical location regression problem and propose a new TSR framework called LORE, standing for LOgical location REgression network, which for the first time regresses logical location as well as spatial location of table cells in a unified network.

regression

Paper
Add Code

FontDiffuser: One-Shot Font Generation via Denoising Diffusion with Multi-Scale Content Aggregation and Style Contrastive Learning

1 code implementation • 19 Dec 2023 • Zhenhua Yang, Dezhi Peng, Yuxin Kong, Yuyi Zhang, Cong Yao, Lianwen Jin

Automatic font generation is an imitation task, which aims to create a font library that mimics the style of reference images while preserving the content from source images.

Contrastive Learning Denoising +3

188

Paper
Code

DocXChain: A Powerful Open-Source Toolchain for Document Parsing and Beyond

1 code implementation • 19 Oct 2023 • Cong Yao

In this report, we introduce DocXChain, a powerful open-source toolchain for document parsing, which is designed and developed to automatically convert the rich information embodied in unstructured documents, such as text, tables and charts, into structured representations that are readable and manipulable by machines.

Document Layout Analysis document understanding +4

997

Paper
Code

Vision Grid Transformer for Document Layout Analysis

1 code implementation • ICCV 2023 • Cheng Da, Chuwei Luo, Qi Zheng, Cong Yao

Document pre-trained models and grid-based models have proven to be very effective on various tasks in Document AI.

Ranked #1 on Document Layout Analysis on PubLayNet val

Document Layout Analysis document understanding +1

997

Paper
Code

LISTER: Neighbor Decoding for Length-Insensitive Scene Text Recognition

1 code implementation • ICCV 2023 • Changxu Cheng, Peng Wang, Cheng Da, Qi Zheng, Cong Yao

The diversity in length constitutes a significant characteristic of text.

Decoder Scene Text Recognition

997

Paper
Code

Multi-Granularity Prediction with Learnable Fusion for Scene Text Recognition

1 code implementation • 25 Jul 2023 • Cheng Da, Peng Wang, Cong Yao

Specifically, MGP-STR achieves an average recognition accuracy of $94\%$ on standard benchmarks for scene text recognition.

Language Modelling Optical Character Recognition (OCR) +1

997

Paper
Code

Conditional Text Image Generation with Diffusion Models

no code implementations • CVPR 2023 • Yuanzhi Zhu, Zhaohai Li, Tianwei Wang, Mengchao He, Cong Yao

Current text recognition systems, including those for handwritten scripts and scene text, have relied heavily on image synthesis and augmentation, since it is difficult to realize real-world complexity and diversity through collecting and annotating enough real text images.

Domain Adaptation Image Generation

Paper
Add Code

GeoLayoutLM: Geometric Pre-training for Visual Information Extraction

1 code implementation • CVPR 2023 • Chuwei Luo, Changxu Cheng, Qi Zheng, Cong Yao

Additionally, novel relation heads, which are pre-trained by the geometric pre-training tasks and fine-tuned for RE, are elaborately designed to enrich and enhance the feature representation.

Ranked #1 on Key Information Extraction on CORD

Document AI entity_extraction +3

997

Paper
Code

Modeling Entities as Semantic Points for Visual Information Extraction in the Wild

no code implementations • CVPR 2023 • Zhibo Yang, Rujiao Long, Pengfei Wang, Sibo Song, Humen Zhong, Wenqing Cheng, Xiang Bai, Cong Yao

As the first contribution of this work, we curate and release a new dataset for VIE, in which the document images are much more challenging in that they are taken from real applications, and difficulties such as blur, partial occlusion, and printing shift are quite common.

Text Spotting

Paper
Add Code

LORE: Logical Location Regression Network for Table Structure Recognition

1 code implementation • 7 Mar 2023 • Hangdi Xing, Feiyu Gao, Rujiao Long, Jiajun Bu, Qi Zheng, Liangcheng Li, Cong Yao, Zhi Yu

Table structure recognition (TSR) aims at extracting tables in images into machine-understandable formats.

regression Table Recognition

997

Paper
Code

Levenshtein OCR

2 code implementations • 8 Sep 2022 • Cheng Da, Peng Wang, Cong Yao

A novel scene text recognizer based on Vision-Language Transformer (VLT) is presented.

Imitation Learning Optical Character Recognition (OCR) +1

997

Paper
Code

Multi-Granularity Prediction for Scene Text Recognition

2 code implementations • 8 Sep 2022 • Peng Wang, Cheng Da, Cong Yao

In this work, we first draw inspiration from the recent progress in Vision Transformer (ViT) to construct a conceptually simple yet powerful vision STR model, which is built upon ViT and outperforms previous state-of-the-art models for scene text recognition, including both pure vision models and language-augmented methods.

Ranked #2 on Scene Text Recognition on Uber-Text (using extra training data)

Language Modelling Optical Character Recognition (OCR) +1

997

Paper
Code

Bi-VLDoc: Bidirectional Vision-Language Modeling for Visually-Rich Document Understanding

no code implementations • 27 Jun 2022 • Chuwei Luo, Guozhi Tang, Qi Zheng, Cong Yao, Lianwen Jin, Chenliang Li, Yang Xue, Luo Si

Multi-modal document pre-trained models have proven to be very effective in a variety of visually-rich document understanding (VrDU) tasks.

Document Classification document understanding +2

Paper
Add Code

Vision-Language Pre-Training for Boosting Scene Text Detectors

2 code implementations • CVPR 2022 • Sibo Song, Jianqiang Wan, Zhibo Yang, Jun Tang, Wenqing Cheng, Xiang Bai, Cong Yao

In this paper, we specifically adapt vision-language joint learning for scene text detection, a task that intrinsically involves cross-modal interaction between the two modalities: vision and language, since text is the written form of language.

Contrastive Learning Language Modelling +4

997

Paper
Code

Revisiting Document Image Dewarping by Grid Regularization

no code implementations • CVPR 2022 • Xiangwei Jiang, Rujiao Long, Nan Xue, Zhibo Yang, Cong Yao, Gui-Song Xia

This paper addresses the problem of document image dewarping, which aims at eliminating the geometric distortion in document images for document digitization.

Ranked #3 on Local Distortion on DocUNet

Local Distortion Optical Flow Estimation

Paper
Add Code

Real-Time Scene Text Detection with Differentiable Binarization and Adaptive Scale Fusion

5 code implementations • 21 Feb 2022 • Minghui Liao, Zhisheng Zou, Zhaoyi Wan, Cong Yao, Xiang Bai

By incorporating the proposed DB and ASF with the segmentation network, our proposed scene text detector consistently achieves state-of-the-art results, in terms of both detection accuracy and speed, on five standard benchmarks.

Ranked #3 on Scene Text Detection on MSRA-TD500

Binarization Model Optimization +3

38,845

Paper
Code

ARTS: Eliminating Inconsistency between Text Detection and Recognition with Auto-Rectification Text Spotter

no code implementations • 20 Oct 2021 • Humen Zhong, Jun Tang, Wenhai Wang, Zhibo Yang, Cong Yao, Tong Lu

Recent approaches for end-to-end text spotting have achieved promising results.

Text Detection Text Spotting

Paper
Add Code

Facial Attribute Transformers for Precise and Robust Makeup Transfer

no code implementations • 7 Apr 2021 • Zhaoyi Wan, Haoran Chen, Jielei Zhang, Wentao Jiang, Cong Yao, Jiebo Luo

In this paper, we address the problem of makeup transfer, which aims at transplanting the makeup from the reference face to the source face while preserving the identity of the source.

Attribute Face Generation

Paper
Add Code

MOST: A Multi-Oriented Scene Text Detector with Localization Refinement

no code implementations • CVPR 2021 • Minghang He, Minghui Liao, Zhibo Yang, Humen Zhong, Jun Tang, Wenqing Cheng, Cong Yao, Yongpan Wang, Xiang Bai

Over the past few years, the field of scene text detection has progressed rapidly that modern text detectors are able to hunt text in various challenging scenarios.

Scene Text Detection Text Detection

Paper
Add Code

Slender Object Detection: Diagnoses and Improvements

1 code implementation • 17 Nov 2020 • Zhaoyi Wan, Yimin Chen, Sutao Deng, Kunpeng Chen, Cong Yao, Jiebo Luo

In this paper, we are concerned with the detection of a particular type of objects with extreme aspect ratios, namely \textbf{slender objects}.

Ranked #1 on Object Detection on COCO+

Object object-detection +1

Paper
Code

Differentiable Feature Aggregation Search for Knowledge Distillation

no code implementations • ECCV 2020 • Yushuo Guan, Pengyu Zhao, Bingxuan Wang, Yuanxing Zhang, Cong Yao, Kaigui Bian, Jian Tang

To tackle with both the efficiency and the effectiveness of knowledge distillation, we introduce the feature aggregation to imitate the multi-teacher distillation in the single-teacher distillation framework by extracting informative supervision from multiple teacher feature maps.

Knowledge Distillation Model Compression +1

Paper
Add Code

On Vocabulary Reliance in Scene Text Recognition

no code implementations • CVPR 2020 • Zhaoyi Wan, Jielei Zhang, Liang Zhang, Jiebo Luo, Cong Yao

This remedy alleviates the problem of vocabulary reliance and improves the overall scene text recognition performance.

Scene Text Recognition

Paper
Add Code

UnrealText: Synthesizing Realistic Scene Text Images from the Unreal World

3 code implementations • CVPR 2020 • Shangbang Long, Cong Yao

Synthetic data has been a critical tool for training scene text detection and recognition models.

Image Generation Scene Text Detection +2

235

Paper
Code

A New Perspective for Flexible Feature Gathering in Scene Text Recognition Via Character Anchor Pooling

no code implementations • 10 Feb 2020 • Shangbang Long, Yushuo Guan, Kaigui Bian, Cong Yao

Irregular scene text recognition has attracted much attention from the research community, mainly due to the complexity of shapes of text in natural scene.

Scene Text Recognition

Paper
Add Code

TextScanner: Reading Characters in Order for Robust Scene Text Recognition

no code implementations • 28 Dec 2019 • Zhaoyi Wan, Minghang He, Haoran Chen, Xiang Bai, Cong Yao

Driven by deep learning and the large volume of data, scene text recognition has evolved rapidly in recent years.

Ranked #19 on Scene Text Recognition on ICDAR2015

Position Scene Text Recognition +2

Paper
Add Code

Real-time Scene Text Detection with Differentiable Binarization

15 code implementations • 20 Nov 2019 • Minghui Liao, Zhaoyi Wan, Cong Yao, Kai Chen, Xiang Bai

Recently, segmentation-based methods are quite popular in scene text detection, as the segmentation results can more accurately describe scene text of various shapes such as curve text.

Ranked #6 on Scene Text Detection on MSRA-TD500

Binarization Optical Character Recognition (OCR) +3

38,845

Paper
Code

Rethinking Irregular Scene Text Recognition

1 code implementation • 30 Aug 2019 • Shangbang Long, Yushuo Guan, Bingxuan Wang, Kaigui Bian, Cong Yao

Reading text from natural images is challenging due to the great variety in text font, color, size, complex background and etc..

Scene Text Detection

218

Paper
Code

Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes

1 code implementation • ECCV 2018 • Minghui Liao, Pengyuan Lyu, Minghang He, Cong Yao, Wenhao Wu, Xiang Bai

Moreover, we further investigate the recognition module of our method separately, which significantly outperforms state-of-the-art methods on both regular and irregular text datasets for scene text recognition.

Scene Text Recognition Semantic Segmentation +2

413

Paper
Code

Symmetry-constrained Rectification Network for Scene Text Recognition

no code implementations • ICCV 2019 • MingKun Yang, Yushuo Guan, Minghui Liao, Xin He, Kaigui Bian, Song Bai, Cong Yao, Xiang Bai

Reading text in the wild is a very challenging task due to the diversity of text instances and the complexity of natural scenes.

Scene Text Recognition

Paper
Add Code

2D-CTC for Scene Text Recognition

no code implementations • 23 Jul 2019 • Zhaoyi Wan, Fengming Xie, Yibo Liu, Xiang Bai, Cong Yao

Scene text recognition has been an important, active research topic in computer vision for years.

Decoder Scene Text Recognition +2

Paper
Add Code

SynthText3D: Synthesizing Scene Text Images from 3D Virtual Worlds

1 code implementation • 13 Jul 2019 • Minghui Liao, Boyu Song, Shangbang Long, Minghang He, Cong Yao, Xiang Bai

Different from the previous methods which paste the rendered text on static 2D images, our method can render the 3D virtual scene and text instances as an entirety.

Image Generation Scene Text Detection +1

141

Paper
Code

Scene Text Detection with Supervised Pyramid Context Network

2 code implementations • 21 Nov 2018 • Enze Xie, Yuhang Zang, Shuai Shao, Gang Yu, Cong Yao, Guangyao Li

We propose a supervised pyramid context network (SPCNET) to precisely locate text regions while suppressing false positives.

Ranked #2 on Scene Text Detection on ICDAR 2013

Instance Segmentation Scene Text Detection +2

120

Paper
Code

Scene Text Detection and Recognition: The Deep Learning Era

1 code implementation • 10 Nov 2018 • Shangbang Long, Xin He, Cong Yao

As an important research area in computer vision, scene text detection and recognition has been inescapably influenced by this wave of revolution, consequentially entering the era of deep learning.

Scene Text Detection Text Detection

Paper
Code

Scene Text Recognition from Two-Dimensional Perspective

no code implementations • 18 Sep 2018 • Minghui Liao, Jian Zhang, Zhaoyi Wan, Fengming Xie, Jiajun Liang, Pengyuan Lyu, Cong Yao, Xiang Bai

Inspired by speech recognition, recent state-of-the-art algorithms mostly consider scene text recognition as a sequence prediction problem.

Ranked #32 on Scene Text Recognition on SVT

Scene Text Recognition Semantic Segmentation +4

Paper
Add Code

Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes

1 code implementation • ECCV 2018 • Pengyuan Lyu, Minghui Liao, Cong Yao, Wenhao Wu, Xiang Bai

Recently, models based on deep neural networks have dominated the fields of scene text detection and recognition.

Ranked #3 on Scene Text Detection on ICDAR 2013

Scene Text Detection Semantic Segmentation +2

261

Paper
Code

TextSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes

3 code implementations • ECCV 2018 • Shangbang Long, Jiaqiang Ruan, Wenjie Zhang, Xin He, Wenhao Wu, Cong Yao

Driven by deep neural networks and large scale datasets, scene text detection methods have progressed substantially over the past years, continuously refreshing the performance records on various standard benchmarks.

Ranked #2 on Curved Text Detection on SCUT-CTW1500

Curved Text Detection Text Detection

4,099

Paper
Code

ASTER: An Attentional Scene Text Recognizer with Flexible Rectification

3 code implementations • good 2018 • Baoguang Shi, Mingkun Yang, Xinggang Wang, Pengyuan Lyu, Cong Yao, and Xiang Bai

SCENE text recognition has attracted great interest from the academia and the industry in recent years owing to its importance in a wide range of applications.

Ranked #22 on Scene Text Recognition on ICDAR2015

Optical Character Recognition Optical Character Recognition (OCR) +1

715

Paper
Code

Multi-Oriented Scene Text Detection via Corner Localization and Region Segmentation

1 code implementation • CVPR 2018 • Pengyuan Lyu, Cong Yao, Wenhao Wu, Shuicheng Yan, Xiang Bai

We propose to detect scene text by localizing corner points of text bounding boxes and segmenting text regions in relative positions.

Ranked #2 on Scene Text Detection on ICDAR 2017 MLT

Multi-Oriented Scene Text Detection object-detection +2

315

Paper
Code

ICDAR2017 Competition on Reading Chinese Text in the Wild (RCTW-17)

5 code implementations • 31 Aug 2017 • Baoguang Shi, Cong Yao, Minghui Liao, Mingkun Yang, Pei Xu, Linyan Cui, Serge Belongie, Shijian Lu, Xiang Bai

This report introduces RCTW, a new competition that focuses on Chinese text reading.

valid

2,895

Paper
Code

Auto-Encoder Guided GAN for Chinese Calligraphy Synthesis

no code implementations • 27 Jun 2017 • Pengyuan Lyu, Xiang Bai, Cong Yao, Zhen Zhu, Tengteng Huang, Wenyu Liu

In this paper, we investigate the Chinese calligraphy synthesis problem: synthesizing Chinese calligraphy images with specified style from standard font(eg.

Image-to-Image Translation Translation

Paper
Add Code

Point Linking Network for Object Detection

no code implementations • 12 Jun 2017 • Xinggang Wang, Kaibing Chen, Zilong Huang, Cong Yao, Wenyu Liu

The deep ConvNets based object detectors mainly focus on regressing the coordinates of bounding box, e. g., Faster-R-CNN, YOLO and SSD.

Object object-detection +1

Paper
Add Code

EAST: An Efficient and Accurate Scene Text Detector

32 code implementations • CVPR 2017 • Xinyu Zhou, Cong Yao, He Wen, Yuzhi Wang, Shuchang Zhou, Weiran He, Jiajun Liang

Previous approaches for scene text detection have already achieved promising performances across various benchmarks.

Ranked #3 on Scene Text Detection on COCO-Text

Curved Text Detection Optical Character Recognition (OCR) +1

38,845

Paper
Code

Training Bit Fully Convolutional Network for Fast Semantic Segmentation

no code implementations • 1 Dec 2016 • He Wen, Shuchang Zhou, Zhe Liang, Yuxiang Zhang, Dieqiao Feng, Xinyu Zhou, Cong Yao

Fully convolutional neural networks give accurate, per-pixel prediction for input images and have applications like semantic segmentation.

Segmentation Semantic Segmentation

Paper
Add Code

Effective Quantization Methods for Recurrent Neural Networks

2 code implementations • 30 Nov 2016 • Qinyao He, He Wen, Shuchang Zhou, Yuxin Wu, Cong Yao, Xinyu Zhou, Yuheng Zou

In addition, we propose balanced quantization methods for weights to further reduce performance degradation.

Quantization

Paper
Code

Scene Text Detection via Holistic, Multi-Channel Prediction

no code implementations • 29 Jun 2016 • Cong Yao, Xiang Bai, Nong Sang, Xinyu Zhou, Shuchang Zhou, Zhimin Cao

Recently, scene text detection has become an active research topic in computer vision and document analysis, because of its great importance and significant challenge.

Ranked #6 on Scene Text Detection on COCO-Text

Scene Text Detection Semantic Segmentation +1

Paper
Add Code

Multi-Oriented Text Detection with Fully Convolutional Networks

1 code implementation • CVPR 2016 • Zheng Zhang, Chengquan Zhang, Wei Shen, Cong Yao, Wenyu Liu, Xiang Bai

In this paper, we propose a novel approach for text detec- tion in natural images.

Ranked #40 on Scene Text Detection on ICDAR 2015

Scene Text Detection Text Detection

Paper
Code

Robust Scene Text Recognition with Automatic Rectification

5 code implementations • CVPR 2016 • Baoguang Shi, Xinggang Wang, Pengyuan Lyu, Cong Yao, Xiang Bai

We show that the model is able to recognize several types of irregular text, including perspective text and curved text.

Ranked #10 on Scene Text Recognition on ICDAR 2003

Optical Character Recognition (OCR) Scene Text Detection +1

38,845

Paper
Code

Incidental Scene Text Understanding: Recent Progresses on ICDAR 2015 Robust Reading Competition Challenge 4

no code implementations • 30 Nov 2015 • Cong Yao, Jia-Nan Wu, Xinyu Zhou, Chi Zhang, Shuchang Zhou, Zhimin Cao, Qi Yin

Different from focused texts present in natural images, which are captured with user's intention and intervention, incidental texts usually exhibit much more diversity, variability and complexity, thus posing significant difficulties and challenges for scene text detection and recognition algorithms.

Scene Text Detection Text Detection

Paper
Add Code

Relaxed Multiple-Instance SVM with Application to Object Discovery

no code implementations • ICCV 2015 • Xinggang Wang, Zhuotun Zhu, Cong Yao, Xiang Bai

Multiple-instance learning (MIL) has served as an important tool for a wide range of vision applications, for instance, image classification, object detection, and visual tracking.

General Classification Image Classification +6

Paper
Add Code

An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition

83 code implementations • 21 Jul 2015 • Baoguang Shi, Xiang Bai, Cong Yao

In this paper, we investigate the problem of scene text recognition, which is among the most important and challenging tasks in image-based sequence recognition.

Ranked #12 on Scene Text Recognition on ICDAR 2003

Optical Character Recognition (OCR) Scene Text Recognition

38,845

Paper
Code

ICDAR 2015 Text Reading in the Wild Competition

no code implementations • 10 Jun 2015 • Xinyu Zhou, Shuchang Zhou, Cong Yao, Zhimin Cao, Qi Yin

Recently, text detection and recognition in natural scenes are becoming increasing popular in the computer vision community as well as the document analysis community.

Text Detection

Paper
Add Code

Symmetry-Based Text Line Detection in Natural Scenes

no code implementations • CVPR 2015 • Zheng Zhang, Wei Shen, Cong Yao, Xiang Bai

Recently, a variety of real-world applications have triggered huge demand for techniques that can extract textual information from natural scenes.

Line Detection Scene Text Detection +1

Paper
Add Code

Automatic Script Identification in the Wild

no code implementations • 12 May 2015 • Baoguang Shi, Cong Yao, Chengquan Zhang, Xiaowei Guo, Feiyue Huang, Xiang Bai

With the rapid increase of transnational communication and cooperation, people frequently encounter multilingual scenarios in various situations.

General Classification Image Classification

Paper
Add Code

Deep Learning Representation using Autoencoder for 3D Shape Retrieval

no code implementations • 25 Sep 2014 • Zhuotun Zhu, Xinggang Wang, Song Bai, Cong Yao, Xiang Bai

By combing the global deep learning representation and the local descriptor representation, our method can obtain the state-of-the-art performance on 3D shape retrieval benchmarks.

3D Shape Classification 3D Shape Recognition +5

Paper
Add Code

Strokelets: A Learned Multi-Scale Representation for Scene Text Recognition

no code implementations • CVPR 2014 • Cong Yao, Xiang Bai, Baoguang Shi, Wenyu Liu

Driven by the wide range of applications, scene text detection and recognition have become active research topics in computer vision.

Scene Text Detection Scene Text Recognition +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.