no code implementations • 9 Apr 2024 • Masato Fujitake
In this paper, we present a method for enhancing the accuracy of scene text recognition tasks by judging whether the image and text match each other.
no code implementations • 21 Mar 2024 • Masato Fujitake
By leveraging the strengths of existing research in document image understanding and LLMs' superior language understanding capabilities, the proposed model, fine-tuned with multimodal instruction datasets, performs an understanding of document images in a single model.
no code implementations • 28 Dec 2023 • Masato Fujitake
Therefore, we propose a deep reinforcement learning localization method for logo recognition (RL-LOGO).
no code implementations • 31 Oct 2023 • Yuki Okumura, Masato Fujitake
The FA team participated in the Table Data Extraction (TDE) and Text-to-Table Relationship Extraction (TTRE) tasks of the NTCIR-17 Understanding of Non-Financial Objects in Financial Reports (UFO).
no code implementations • 30 Aug 2023 • Masato Fujitake
Typical text recognition methods rely on an encoder-decoder structure, in which the encoder extracts features from an image, and the decoder produces recognized text from these features.
no code implementations • 29 Jun 2023 • Masato Fujitake
This paper presents Diffusion Model for Scene Text Recognition (DiffusionSTR), an end-to-end text recognition framework using diffusion models for recognizing text in the wild.
Ranked #12 on Scene Text Recognition on IIIT5k
no code implementations • 21 Feb 2023 • Masato Fujitake
Scene-text spotting is a task that predicts a text area on natural scene images and recognizes its text characters simultaneously.
Ranked #1 on Text Spotting on SCUT-CTW1500
1 code implementation • IEEE Access 2022 • Masato Fujitake, Akihiro Sugimoto
In this paper, we enhance features element-wisely before the object candidate region detection, proposing Video Sparse Transformer with Attention-guided Memory (VSTAM).
Ranked #1 on Object Detection on UA-DETRAC