1 code implementation • 30 Apr 2024 • Yuliang Liu, Mingxin Huang, Hao Yan, Linger Deng, Weijia Wu, Hao Lu, Chunhua Shen, Lianwen Jin, Xiang Bai
Typically, we propose a Prompt Queries Generation Module and a Tasks-aware Adapter to effectively convert the original single-task model into a multi-task model suitable for both image and video scenarios with minimal additional parameters.
no code implementations • 19 Apr 2024 • Jingqun Tang, Chunhui Lin, Zhen Zhao, Shu Wei, Binghong Wu, Qi Liu, Hao Feng, Yang Li, Siqi Wang, Lei Liao, Wei Shi, Yuliang Liu, Hao liu, Yuan Xie, Xiang Bai, Can Huang
Text-centric visual question answering (VQA) has made great strides with the development of Multimodal Large Language Models (MLLMs), yet open-source models still fall short of leading models like GPT4V and Gemini, partly due to a lack of extensive, high-quality instruction tuning data.
3 code implementations • 6 Apr 2024 • Mingxin Huang, Hongliang Li, Yuliang Liu, Xiang Bai, Lianwen Jin
Subsequently, we introduce a Bridge that connects the locked detector and recognizer through a zero-initialized neural network.
1 code implementation • 28 Mar 2024 • Jianqiang Wan, Sibo Song, Wenwen Yu, Yuliang Liu, Wenqing Cheng, Fei Huang, Xiang Bai, Cong Yao, Zhibo Yang
Recently, visually-situated text parsing (VsTP) has experienced notable advancements, driven by the increasing demand for automated document understanding and the emergence of Generative Large Language Models (LLMs) capable of processing document-based questions.
1 code implementation • 7 Mar 2024 • Yuliang Liu, Biao Yang, Qiang Liu, Zhang Li, Zhiyin Ma, Shuo Zhang, Xiang Bai
We present TextMonkey, a large multimodal model (LMM) tailored for text-centric tasks.
1 code implementation • 5 Feb 2024 • Yang Jin, Zhicheng Sun, Kun Xu, Liwei Chen, Hao Jiang, Quzhe Huang, Chengru Song, Yuliang Liu, Di Zhang, Yang song, Kun Gai, Yadong Mu
In light of recent advances in multimodal Large Language Models (LLMs), there is increasing attention to scaling them from image-text data to more informative real-world videos.
Ranked #64 on Visual Question Answering on MM-Vet
no code implementations • 27 Jan 2024 • Pengjie Wang, Kaile Zhang, Xinyu Wang, Shengwei Han, Yongge Liu, Jinpeng Wan, Haisu Guan, Zhebin Kuang, Lianwen Jin, Xiang Bai, Yuliang Liu
Additionally, all images and labels have been reviewed and corrected by experts in oracle bone studies.
no code implementations • 23 Jan 2024 • Haisu Guan, Jinpeng Wan, Yuliang Liu, Pengjie Wang, Kaile Zhang, Zhebin Kuang, Xinyu Wang, Xiang Bai, Lianwen Jin
We conducted validation and simulated deciphering on the constructed dataset, and the results demonstrate its high efficacy in aiding the study of oracle bone script.
no code implementations • 15 Jan 2024 • Mingxin Huang, Dezhi Peng, Hongliang Li, Zhenghao Peng, Chongyu Liu, Dahua Lin, Yuliang Liu, Xiang Bai, Lianwen Jin
In this paper, we propose a new end-to-end scene text spotting framework termed SwinTextSpotter v2, which seeks to find a better synergy between text detection and recognition.
no code implementations • 21 Dec 2023 • Linger Deng, Mingxin Huang, Xudong Xie, Yuliang Liu, Lianwen Jin, Xiang Bai
We demonstrate the accuracy of the generated polygons through extensive experiments: 1) By creating polygons from ground truth points, we achieved an accuracy of 82. 0% on ICDAR 2015; 2) In training detectors with polygons generated by our method, we attained 86% of the accuracy relative to training with ground truth (GT); 3) Additionally, the proposed Point2Polygon can be seamlessly integrated to empower single-point spotters to generate polygons.
no code implementations • 12 Dec 2023 • Dongliang Luo, Yuliang Liu, Rui Yang, Xianjin Liu, Jishen Zeng, Yu Zhou, Xiang Bai
With the surge in realistic text tampering, detecting fraudulent text in images has gained prominence for maintaining information security.
no code implementations • 28 Nov 2023 • Ling Fu, Zijie Wu, Yingying Zhu, Yuliang Liu, Xiang Bai
We contend that one main limitation of existing generation methods is the insufficient integration of foreground text with the background.
1 code implementation • 16 Nov 2023 • Yuliang Liu, Xiangru Tang, Zefan Cai, Junjie Lu, Yichi Zhang, Yanjun Shao, Zexuan Deng, Helan Hu, Kaikai An, Ruijun Huang, Shuzheng Si, Sheng Chen, Haozhe Zhao, Liang Chen, Yan Wang, Tianyu Liu, Zhiwei Jiang, Baobao Chang, Yujia Qin, Wangchunshu Zhou, Yilun Zhao, Arman Cohan, Mark Gerstein
While Large Language Models (LLMs) have demonstrated proficiency in code generation benchmarks, translating these results into practical development scenarios - where leveraging existing repository-level libraries is the norm - remains challenging.
1 code implementation • 11 Nov 2023 • Zhang Li, Biao Yang, Qiang Liu, Zhiyin Ma, Shuo Zhang, Jingxu Yang, Yabo Sun, Yuliang Liu, Xiang Bai
Additionally, experiments on 18 datasets further demonstrate that Monkey surpasses existing LMMs in many tasks like Image Captioning and various Visual Question Answering formats.
no code implementations • 11 Oct 2023 • Jiayi Fu, Lei Lin, Xiaoyang Gao, Pengli Liu, Zhengzong Chen, Zhirui Yang, ShengNan Zhang, Xue Zheng, Yan Li, Yuliang Liu, Xucheng Ye, Yiqiao Liao, Chao Liao, Bin Chen, Chengru Song, Junchen Wan, Zijia Lin, Fuzheng Zhang, Zhongyuan Wang, Di Zhang, Kun Gai
Recent advancements in large language models (LLMs) have demonstrated remarkable abilities in handling a variety of natural language processing (NLP) downstream tasks, even on mathematical tasks requiring multi-step reasoning.
Ranked #88 on Arithmetic Reasoning on GSM8K (using extra training data)
1 code implementation • 21 Aug 2023 • Wenwen Yu, Yuliang Liu, Xingkui Zhu, Haoyu Cao, Xing Sun, Xiang Bai
Utilizing only 10% of the supervised data, FastTCM-CR50 improves performance by an average of 26. 5% and 5. 5% for text detection and spotting tasks, respectively.
3 code implementations • ICCV 2023 • Mingxin Huang, Jiaxin Zhang, Dezhi Peng, Hao Lu, Can Huang, Yuliang Liu, Xiang Bai, Lianwen Jin
To this end, we introduce a new model named Explicit Synergy-based Text Spotting Transformer framework (ESTextSpotter), which achieves explicit synergy by modeling discriminative and interactive features for text detection and recognition within a single decoder.
1 code implementation • 17 Jul 2023 • Wenze Liu, Hao Lu, Yuliang Liu, Zhiguo Cao
In DAB-DETR, such queries are modulated by the so-called conditional linear projection at each decoder stage, aiming to search for positions of interest such as the four extremities of the box.
2 code implementations • 17 Jul 2023 • Wenze Liu, Hao Lu, Yuliang Liu, Zhiguo Cao
We introduce the notion of point affiliation into feature upsampling.
1 code implementation • 21 Jun 2023 • Dezhi Peng, Chongyu Liu, Yuliang Liu, Lianwen Jin
As ViTEraser implicitly integrates text localization and inpainting, we propose a novel end-to-end pretraining method, termed SegMIM, which focuses the encoder and decoder on the text box segmentation and masked image modeling tasks, respectively.
1 code implementation • 6 Jun 2023 • Wenwen Yu, MingYu Liu, Biao Yang, Enming Zhang, Deqiang Jiang, Xing Sun, Yuliang Liu, Xiang Bai
Text recognition in the wild is a long-standing problem in computer vision.
no code implementations • 5 Jun 2023 • Wenwen Yu, Chengquan Zhang, Haoyu Cao, Wei Hua, Bohan Li, Huang Chen, MingYu Liu, Mingrui Chen, Jianfeng Kuang, Mengjun Cheng, Yuning Du, Shikun Feng, Xiaoguang Hu, Pengyuan Lyu, Kun Yao, Yuechen Yu, Yuliang Liu, Wanxiang Che, Errui Ding, Cheng-Lin Liu, Jiebo Luo, Shuicheng Yan, Min Zhang, Dimosthenis Karatzas, Xing Sun, Jingdong Wang, Xiang Bai
It is hoped that this competition will attract many researchers in the field of CV and NLP, and bring some new thoughts to the field of Document AI.
1 code implementation • 13 May 2023 • Yuliang Liu, Zhang Li, Biao Yang, Chunyuan Li, XuCheng Yin, Cheng-Lin Liu, Lianwen Jin, Xiang Bai
In this paper, we conducted a comprehensive evaluation of Large Multimodal Models, such as GPT4V and Gemini, in various text-related visual tasks including Text Recognition, Scene Text-Centric Visual Question Answering (VQA), Document-Oriented VQA, Key Information Extraction (KIE), and Handwritten Mathematical Expression Recognition (HMER).
no code implementations • 24 Apr 2023 • Wenwen Yu, MingYu Liu, Mingrui Chen, Ning Lu, Yinlong Wen, Yuliang Liu, Dimosthenis Karatzas, Xiang Bai
To promote research in this area, we organized ICDAR 2023 competition on reading the seal title (ReST), which included two tasks: seal title text detection (Task 1) and end-to-end seal title recognition (Task 2).
2 code implementations • 10 Mar 2023 • Zhiwei Zhang, Yuliang Liu
This stream is subsequently fed into the decoder-based transformer to generate visual re-creations and textual feedback in the second stage.
1 code implementation • CVPR 2023 • Wenwen Yu, Yuliang Liu, Wei Hua, Deqiang Jiang, Bo Ren, Xiang Bai
Recently, pretraining approaches based on vision language models have made effective progresses in the field of text detection.
1 code implementation • 6 Feb 2023 • Yuliang Liu, Shenggui Li, Jiarui Fang, Yanjun Shao, Boyuan Yao, Yang You
To address these challenges, we introduce a system that can jointly optimize distributed execution and gradient checkpointing plans.
3 code implementations • 4 Jan 2023 • Yuliang Liu, Jiaxin Zhang, Dezhi Peng, Mingxin Huang, Xinyu Wang, Jingqun Tang, Can Huang, Dahua Lin, Chunhua Shen, Xiang Bai, Lianwen Jin
Within the context of our SPTS v2 framework, our experiments suggest a potential preference for single-point representation in scene text spotting when compared to other representations.
Ranked #15 on Text Spotting on ICDAR 2015
1 code implementation • CVPR 2023 • Chenfan Qu, Chongyu Liu, Yuliang Liu, Xinhong Chen, Dezhi Peng, Fengjun Guo, Lianwen Jin
In this paper, we propose a novel framework to capture more fine-grained clues in complex scenarios for tampered text detection, termed as Document Tampering Detector (DTD), which consists of a Frequency Perception Head (FPH) to compensate the deficiencies caused by the inconspicuous visual features, and a Multi-view Iterative Decoder (MID) for fully utilizing the information of features in different scales.
1 code implementation • 17 Oct 2022 • Peirong Zhang, Jiajia Jiang, Yuliang Liu, Lianwen Jin
MSDS-ChS consists of handwritten Chinese signatures, which, to the best of our knowledge, is the largest publicly available Chinese signature dataset for handwriting verification, at least eight times larger than existing online datasets.
2 code implementations • 26 Sep 2022 • Hao Lu, Wenze Liu, Zixuan Ye, Hongtao Fu, Yuliang Liu, Zhiguo Cao
We introduce point affiliation into feature upsampling, a notion that describes the affiliation of each upsampled point to a semantic cluster formed by local decoder feature points with semantic similarity.
Ranked #5 on Feature Upsampling on ImageNet
4 code implementations • 29 Jul 2022 • Dezhi Peng, Lianwen Jin, Yuliang Liu, Canjie Luo, Songxuan Lai
Utilizing the proposed weakly supervised learning framework, PageNet requires only transcripts to be annotated for real data; however, it can still output detection and recognition results at both the character and line levels, avoiding the labor and cost of labeling bounding boxes of characters and text lines.
1 code implementation • 21 Jul 2022 • Chongyu Liu, Lianwen Jin, Yuliang Liu, Canjie Luo, Bangdong Chen, Fengjun Guo, Kai Ding
To address this issue, we propose a Contextual-guided Text Removal Network, termed as CTRNet.
1 code implementation • 30 Mar 2022 • Yu Tang, Chenyu Wang, Yufan Zhang, Yuliang Liu, Xingcheng Zhang, Linbo Qiao, Zhiquan Lai, Dongsheng Li
To the best of our knowledge, we are the first to make a reasonable dynamic runtime scheduler on the combination of tensor swapping and tensor recomputation without user oversight.
2 code implementations • CVPR 2022 • Mingxin Huang, Yuliang Liu, Zhenghao Peng, Chongyu Liu, Dahua Lin, Shenggao Zhu, Nicholas Yuan, Kai Ding, Lianwen Jin
End-to-end scene text spotting has attracted great attention in recent years due to the success of excavating the intrinsic synergy of the scene text detection and recognition.
Ranked #3 on Text Spotting on Inverse-Text
1 code implementation • 15 Dec 2021 • Dezhi Peng, Xinyu Wang, Yuliang Liu, Jiaxin Zhang, Mingxin Huang, Songxuan Lai, Shenggao Zhu, Jing Li, Dahua Lin, Chunhua Shen, Xiang Bai, Lianwen Jin
For the first time, we demonstrate that training scene text spotting models can be achieved with an extremely low-cost annotation of a single-point for each instance.
Ranked #3 on Text Spotting on SCUT-CTW1500
1 code implementation • 28 Oct 2021 • Shenggui Li, Hongxin Liu, Zhengda Bian, Jiarui Fang, Haichen Huang, Yuliang Liu, Boxiang Wang, Yang You
The success of Transformer models has pushed the deep learning model scale to billions of parameters.
1 code implementation • 12 Jul 2021 • Chun Chet Ng, Akmalul Khairi Bin Nazaruddin, Yeong Khang Lee, Xinyu Wang, Yuliang Liu, Chee Seng Chan, Lianwen Jin, Yipeng Sun, Lixin Fan
With hundreds of thousands of electronic chip components are being manufactured every day, chip manufacturers have seen an increasing demand in seeking a more efficient and effective way of inspecting the quality of printed texts on chip components.
1 code implementation • 8 May 2021 • Yuliang Liu, Chunhua Shen, Lianwen Jin, Tong He, Peng Chen, Chongyu Liu, Hao Chen
Previous methods can be roughly categorized into two groups: character-based and segmentation-based, which often require character-level annotations and/or complex post-processing due to the unstructured output.
Ranked #7 on Text Spotting on Inverse-Text
2 code implementations • 1 Jun 2020 • Chenyu Gao, Qi Zhu, Peng Wang, Hui Li, Yuliang Liu, Anton Van Den Hengel, Qi Wu
In this paper, we propose an end-to-end structured multimodal attention (SMA) neural network to mainly solve the first two issues above.
15 code implementations • CVPR 2020 • Yuliang Liu, Hao Chen, Chunhua Shen, Tong He, Lianwen Jin, Liangwei Wang
Our contributions are three-fold: 1) For the first time, we adaptively fit arbitrarily-shaped text by a parameterized Bezier curve.
Ranked #9 on Text Spotting on Inverse-Text
no code implementations • CVPR 2020 • Xinyu Wang, Yuliang Liu, Chunhua Shen, Chun Chet Ng, Canjie Luo, Lianwen Jin, Chee Seng Chan, Anton Van Den Hengel, Liangwei Wang
Visual Question Answering (VQA) methods have made incredible progress, but suffer from a failure to generalize.
no code implementations • 13 Jan 2020 • Canjie Luo, Qingxiang Lin, Yuliang Liu, Lianwen Jin, Chunhua Shen
Furthermore, to tackle the issue of lacking paired training samples, we design an interactive joint training scheme, which shares attention masks from the recognizer to the discriminator, and enables the discriminator to extract the features of each character for further adversarial training.
1 code implementation • 20 Dec 2019 • Yuliang Liu, Tong He, Hao Chen, Xinyu Wang, Canjie Luo, Shuaitao Zhang, Chunhua Shen, Lianwen Jin
More importantly, based on OBD, we provide a detailed analysis of the impact of a collection of refinements, which may inspire others to build state-of-the-art text detectors.
Ranked #3 on Scene Text Detection on ICDAR 2017 MLT
no code implementations • 17 Sep 2019 • Yipeng Sun, Zihan Ni, Chee-Kheng Chng, Yuliang Liu, Canjie Luo, Chun Chet Ng, Junyu Han, Errui Ding, Jingtuo Liu, Dimosthenis Karatzas, Chee Seng Chan, Lianwen Jin
Robust text reading from street view images provides valuable information for various applications.
1 code implementation • 16 Sep 2019 • Chee-Kheng Chng, Yuliang Liu, Yipeng Sun, Chun Chet Ng, Canjie Luo, Zihan Ni, ChuanMing Fang, Shuaitao Zhang, Junyu Han, Errui Ding, Jingtuo Liu, Dimosthenis Karatzas, Chee Seng Chan, Lianwen Jin
This paper reports the ICDAR2019 Robust Reading Challenge on Arbitrary-Shaped Text (RRC-ArT) that consists of three major challenges: i) scene text detection, ii) scene text recognition, and iii) scene text spotting.
1 code implementation • 6 Jun 2019 • Yuliang Liu, Sheng Zhang, Lianwen Jin, Lele Xie, Yaqiang Wu, Zhepeng Wang
Scene text in the wild is commonly presented with high variant characteristics.
Ranked #1 on Scene Text Detection on IC19-ReCTs (using extra training data)
2 code implementations • CVPR 2019 • Zecheng Xie, Yaoxiong Huang, Yuanzhi Zhu, Lianwen Jin, Yuliang Liu, Lele Xie
In this paper, we propose a novel method, aggregation cross-entropy (ACE), for sequence recognition from a brand new perspective.
1 code implementation • CVPR 2019 • Yuliang Liu, Lianwen Jin, Zecheng Xie, Canjie Luo, Shuaitao Zhang, Lele Xie
Evaluation protocols play key role in the developmental progress of text detection methods.
3 code implementations • 3 Dec 2018 • Shuaitao Zhang, Yuliang Liu, Lianwen Jin, Yaoxiong Huang, Songxuan Lai
The feature of the former is first enhanced by a novel lateral connection structure and then refined by four carefully designed losses: multiscale regression loss and content loss, which capture the global discrepancy of different level features; texture loss and total variation loss, which primarily target filling the text region and preserving the reality of the background.
1 code implementation • 16 Nov 2018 • Lele Xie, Yuliang Liu, Lianwen Jin, Zecheng Xie
However, the detection performance is sensitive to the setting of the anchor boxes.
no code implementations • 12 Nov 2017 • Sheng Zhang, Yuliang Liu, Lianwen Jin, Canjie Luo
In this paper, we propose a refined scene text detector with a \textit{novel} Feature Enhancement Network (FEN) for Region Proposal and Text Detection Refinement.
no code implementations • CVPR 2017 • Yuliang Liu, Lianwen Jin
The effectiveness of our approach is evaluated on a public word-level, multi-oriented scene text database, ICDAR 2015 Robust Reading Competition Challenge 4 "Incidental scene text localization".