Generalized Referring Expression Comprehension
5 papers with code • 1 benchmarks • 1 datasets
Generalized Referring Expression Comprehension (GREC) allows expressions indicating any number of target objects. GREC takes an image and a referring expression as input, and requires bounding box(es) prediction of the target object(s).
Most implemented papers
MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding
We also investigate the utility of our model as an object detector on a given label set when fine-tuned in a few-shot setting.
Multi-task Collaborative Network for Joint Referring Expression Comprehension and Segmentation
In addition, we address a key challenge in this multi-task setup, i. e., the prediction conflict, with two innovative designs namely, Consistency Energy Maximization (CEM) and Adaptive Soft Non-Located Suppression (ASNLS).
Vision-Language Transformer and Query Generation for Referring Segmentation
We introduce transformer and multi-head attention to build a network with an encoder-decoder attention mechanism architecture that "queries" the given image with the language expression.
Universal Instance Perception as Object Discovery and Retrieval
All instance perception tasks aim at finding certain objects specified by some queries such as category names, language expressions, and target annotations, but this complete field has been split into multiple independent subtasks.
GREC: Generalized Referring Expression Comprehension
This dataset encompasses a range of expressions: those referring to multiple targets, expressions with no specific target, and the single-target expressions.