Search Results for author: Jinyue Chen

Found 4 papers, 2 papers with code

Focus Anywhere for Fine-grained Multi-page Document Understanding

no code implementations • 23 May 2024 • Chenglong Liu, Haoran Wei, Jinyue Chen, Lingyu Kong, Zheng Ge, Zining Zhu, Liang Zhao, Jianjian Sun, Chunrui Han, Xiangyu Zhang

Modern LVLMs still struggle to achieve fine-grained document understanding, such as OCR/translation/caption for regions of interest to the user, tasks that require the context of the entire page, or even multiple pages.

document understanding Optical Character Recognition (OCR)

Paper
Add Code

OneChart: Purify the Chart Structural Extraction via One Auxiliary Token

1 code implementation • 15 Apr 2024 • Jinyue Chen, Lingyu Kong, Haoran Wei, Chenglong Liu, Zheng Ge, Liang Zhao, Jianjian Sun, Chunrui Han, Xiangyu Zhang

To address this, we propose OneChart: a reliable agent specifically devised for the structural extraction of chart information.

Decoder

Paper
Code

Small Language Model Meets with Reinforced Vision Vocabulary

no code implementations • 23 Jan 2024 • Haoran Wei, Lingyu Kong, Jinyue Chen, Liang Zhao, Zheng Ge, En Yu, Jianjian Sun, Chunrui Han, Xiangyu Zhang

In Vary-toy, we introduce an improved vision vocabulary, allowing the model to not only possess all features of Vary but also gather more generality.

Ranked #81 on Visual Question Answering on MM-Vet

Language Modelling Large Language Model +3

Paper
Add Code

Vary: Scaling up the Vision Vocabulary for Large Vision-Language Models

1 code implementation • 11 Dec 2023 • Haoran Wei, Lingyu Kong, Jinyue Chen, Liang Zhao, Zheng Ge, Jinrong Yang, Jianjian Sun, Chunrui Han, Xiangyu Zhang

Accordingly, we propose Vary, an efficient and effective method to scale up the vision vocabulary of LVLMs.

Ranked #56 on Visual Question Answering on MM-Vet

Decoder Optical Character Recognition (OCR) +1

1,601

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.