no code implementations • 30 Aug 2023 • Wenyi Wu, Karim Bouyarmane, Ismail Tutar
We present Catalog Phrase Grounding (CPG), a model that can associate product textual data (title, brands) into corresponding regions of product images (isolated product region, brand logo region) for e-commerce vision-language applications.
no code implementations • 12 Apr 2022 • Tarik Arici, Kushal Kumar, Hayreddin Çeker, Anoop S V K K Saladi, Ismail Tutar
Our model architecture consists of two subnetworks for the two subtasks: a classifier to predict UoM type (or the question) and an extractor to extract the relevant quantities.
no code implementations • 24 Sep 2021 • Tarik Arici, Mehmet Saygin Seyfioglu, Tal Neiman, Yi Xu, Son Train, Trishul Chilimbi, Belinda Zeng, Ismail Tutar
Vision-and-Language Pre-training (VLP) improves model performance for downstream tasks that require image and text inputs.