Search Results for author: Ali Furkan Biten

Found 17 papers, 9 papers with code

Show, Interpret and Tell: Entity-aware Contextualised Image Captioning in Wikipedia

no code implementations • 21 Sep 2022 • Khanh Nguyen, Ali Furkan Biten, Andres Mafla, Lluis Gomez, Dimosthenis Karatzas

Particularly, a similar Wikimedia image can be used to illustrate different articles, and the produced caption needs to be adapted to a specific context, therefore allowing us to explore the limits of a model to adjust captions to different contextual information.

Image Captioning

Paper
Add Code

Out-of-Vocabulary Challenge Report

no code implementations • 14 Sep 2022 • Sergi Garcia-Bordils, Andrés Mafla, Ali Furkan Biten, Oren Nuriel, Aviad Aberdam, Shai Mazor, Ron Litman, Dimosthenis Karatzas

This paper presents final results of the Out-Of-Vocabulary 2022 (OOV) challenge.

Optical Character Recognition Optical Character Recognition (OCR) +1

Paper
Add Code

MUST-VQA: MUltilingual Scene-text VQA

no code implementations • 14 Sep 2022 • Emanuele Vivoli, Ali Furkan Biten, Andres Mafla, Dimosthenis Karatzas, Lluis Gomez

In this paper, we present a framework for Multilingual Scene Text Visual Question Answering that deals with new languages in a zero-shot fashion.

Question Answering Visual Question Answering

Paper
Add Code

Text-DIAE: A Self-Supervised Degradation Invariant Autoencoders for Text Recognition and Document Enhancement

1 code implementation • 9 Mar 2022 • Mohamed Ali Souibgui, Sanket Biswas, Andres Mafla, Ali Furkan Biten, Alicia Fornés, Yousri Kessentini, Josep Lladós, Lluis Gomez, Dimosthenis Karatzas

In this paper, we propose a Text-Degradation Invariant Auto Encoder (Text-DIAE), a self-supervised model designed to tackle two tasks, text recognition (handwritten or scene-text) and document image enhancement.

Document Enhancement Scene Text Recognition

Paper
Code

OCR-IDL: OCR Annotations for Industry Document Library Dataset

1 code implementation • 25 Feb 2022 • Ali Furkan Biten, Rubèn Tito, Lluis Gomez, Ernest Valveny, Dimosthenis Karatzas

It is our hope that OCR-IDL can be a starting point for future works on Document Intelligence.

Optical Character Recognition (OCR)

Paper
Code

LaTr: Layout-Aware Transformer for Scene-Text VQA

1 code implementation • CVPR 2022 • Ali Furkan Biten, Ron Litman, Yusheng Xie, Srikar Appalaraju, R. Manmatha

Accounting for this, we propose a single objective pre-training scheme that requires only text and spatial cues.

Optical Character Recognition (OCR) Question Answering +1

Paper
Code

Is An Image Worth Five Sentences? A New Look into Semantics for Image-Text Matching

no code implementations • 6 Oct 2021 • Ali Furkan Biten, Andres Mafla, Lluis Gomez, Dimosthenis Karatzas

In this work, we propose two metrics that evaluate the degree of semantic relevance of retrieved items, independently of their annotated binary relevance.

Image Captioning Image-text matching +2

Paper
Add Code

Let there be a clock on the beach: Reducing Object Hallucination in Image Captioning

1 code implementation • 4 Oct 2021 • Ali Furkan Biten, Lluis Gomez, Dimosthenis Karatzas

Explaining an image with missing or non-existent objects is known as object bias (hallucination) in image captioning.

Hallucination Image Captioning +1

Paper
Code

Localizing Infinity-shaped fishes: Sketch-guided object localization in the wild

no code implementations • 24 Sep 2021 • Pau Riba, Sounak Dey, Ali Furkan Biten, Josep Llados

This work investigates the problem of sketch-guided object localization (SGOL), where human sketches are used as queries to conduct the object localization in natural images.

Instance Segmentation Object +4

Paper
Add Code

One-shot Compositional Data Generation for Low Resource Handwritten Text Recognition

no code implementations • 11 May 2021 • Mohamed Ali Souibgui, Ali Furkan Biten, Sounak Dey, Alicia Fornés, Yousri Kessentini, Lluis Gomez, Dimosthenis Karatzas, Josep Lladós

Low resource Handwritten Text Recognition (HTR) is a hard problem due to the scarce annotated data and the very limited linguistic information (dictionaries and language models).

Handwritten Text Recognition HTR

Paper
Add Code

Multi-Modal Reasoning Graph for Scene-Text Based Fine-Grained Image Classification and Retrieval

1 code implementation • 21 Sep 2020 • Andres Mafla, Sounak Dey, Ali Furkan Biten, Lluis Gomez, Dimosthenis Karatzas

Scene text instances found in natural images carry explicit semantic information that can provide important cues to solve a wide array of computer vision problems.

Fine-Grained Image Classification General Classification +2

Paper
Code

Multimodal grid features and cell pointers for Scene Text Visual Question Answering

no code implementations • 1 Jun 2020 • Lluís Gómez, Ali Furkan Biten, Rubèn Tito, Andrés Mafla, Marçal Rusiñol, Ernest Valveny, Dimosthenis Karatzas

This paper presents a new model for the task of scene text visual question answering, in which questions about a given image can only be answered by reading and understanding scene text that is present in it.

Question Answering Visual Question Answering

Paper
Add Code

Fine-grained Image Classification and Retrieval by Combining Visual and Locally Pooled Textual Features

2 code implementations • 14 Jan 2020 • Andres Mafla, Sounak Dey, Ali Furkan Biten, Lluis Gomez, Dimosthenis Karatzas

Text contained in an image carries high-level semantics that can be exploited to achieve richer image understanding.

Ranked #1 on Fine-Grained Image Classification on Con-Text

Classification Fine-Grained Image Classification +5

Paper
Code

ICDAR 2019 Competition on Scene Text Visual Question Answering

no code implementations • 30 Jun 2019 • Ali Furkan Biten, Rubèn Tito, Andres Mafla, Lluis Gomez, Marçal Rusiñol, Minesh Mathew, C. V. Jawahar, Ernest Valveny, Dimosthenis Karatzas

ST-VQA introduces an important aspect that is not addressed by any Visual Question Answering system up to date, namely the incorporation of scene text to answer questions asked about an image.

Question Answering Visual Question Answering

Paper
Add Code

Selective Style Transfer for Text

1 code implementation • 4 Jun 2019 • Raul Gomez, Ali Furkan Biten, Lluis Gomez, Jaume Gibert, Marçal Rusiñol, Dimosthenis Karatzas

This paper explores the possibilities of image style transfer applied to text maintaining the original transcriptions.

Data Augmentation Scene Text Detection +2

Paper
Code

Scene Text Visual Question Answering

3 code implementations • ICCV 2019 • Ali Furkan Biten, Ruben Tito, Andres Mafla, Lluis Gomez, Marçal Rusiñol, Ernest Valveny, C. V. Jawahar, Dimosthenis Karatzas

Current visual question answering datasets do not consider the rich semantic information conveyed by text within an image.

Question Answering Visual Question Answering

Paper
Code

Good News, Everyone! Context driven entity-aware captioning for news images

1 code implementation • CVPR 2019 • Ali Furkan Biten, Lluis Gomez, Marçal Rusiñol, Dimosthenis Karatzas

We propose a novel captioning method that is able to leverage contextual information provided by the text of news articles associated with an image.

Descriptive Image Captioning

124

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.