1 code implementation • 3 Aug 2022 • Jun Wang, Mingfei Gao, Yuqian Hu, Ramprasaath R. Selvaraju, Chetan Ramaiah, ran Xu, Joseph F. JaJa, Larry S. Davis
To address this deficiency, we develop a new method to generate high-quality and diverse QA pairs by explicitly utilizing the existing rich text available in the scene context of each image.
no code implementations • 23 Apr 2022 • Viraj Prabhu, Ramprasaath R. Selvaraju, Judy Hoffman, Nikhil Naik
Despite the rapid progress in deep visual recognition, modern computer vision datasets significantly overrepresent the developed world and models trained on such datasets underperform on images from unseen geographies.
1 code implementation • 14 Dec 2021 • Aman Shrivastava, Ramprasaath R. Selvaraju, Nikhil Naik, Vicente Ordonez
We propose CLIP-Lite, an information efficient method for visual representation learning by feature alignment with textual annotations.
no code implementations • 1 Dec 2021 • Brian Chen, Ramprasaath R. Selvaraju, Shih-Fu Chang, Juan Carlos Niebles, Nikhil Naik
In this work, we propose PreViTS, an SSL framework that utilizes an unsupervised tracking signal for selecting clips containing the same object, which helps better utilize temporal transformations of objects.
5 code implementations • NeurIPS 2021 • Junnan Li, Ramprasaath R. Selvaraju, Akhilesh Deepak Gotmare, Shafiq Joty, Caiming Xiong, Steven Hoi
Most existing methods employ a transformer-based multimodal encoder to jointly model visual tokens (region-based image features) and word tokens.
Ranked #5 on Open Vocabulary Attribute Detection on OVAD-Box benchmark (using extra training data)
no code implementations • CVPR 2021 • Ramprasaath R. Selvaraju, Karan Desai, Justin Johnson, Nikhil Naik
Recent advances in self-supervised learning (SSL) have largely closed the gap with supervised ImageNet pretraining.
1 code implementation • NAACL 2021 • Sameer Dharur, Purva Tendulkar, Dhruv Batra, Devi Parikh, Ramprasaath R. Selvaraju
Recent research in Visual Question Answering (VQA) has revealed state-of-the-art models to be inconsistent in their understanding of the world -- they answer seemingly difficult questions requiring reasoning correctly but get simpler associated sub-questions wrong.
no code implementations • CVPR 2020 • Ramprasaath R. Selvaraju, Purva Tendulkar, Devi Parikh, Eric Horvitz, Marco Ribeiro, Besmira Nushi, Ece Kamar
We quantify the extent to which this phenomenon occurs by creating a new Reasoning split of the VQA dataset and collecting VQA-introspect, a new dataset1 which consists of 238K new perception questions which serve as sub questions corresponding to the set of perceptual tasks needed to effectively answer the complex reasoning questions in the Reasoning split.
1 code implementation • 19 Mar 2019 • Purva Tendulkar, Kalpesh Krishna, Ramprasaath R. Selvaraju, Devi Parikh
An approach to make text visually appealing and memorable is semantic reinforcement - the use of visual cues alluding to the context or theme in which the word is being used to reinforce the message (e. g., Google Doodles).
no code implementations • ICCV 2019 • Ramprasaath R. Selvaraju, Stefan Lee, Yilin Shen, Hongxia Jin, Shalini Ghosh, Larry Heck, Dhruv Batra, Devi Parikh
Many vision and language models suffer from poor visual grounding - often falling back on easy-to-learn language priors rather than basing their decisions on visual concepts in the image.
1 code implementation • ECCV 2018 • Ramprasaath R. Selvaraju, Prithvijit Chattopadhyay, Mohamed Elhoseiny, Tilak Sharma, Dhruv Batra, Devi Parikh, Stefan Lee
Our approach, which we call Neuron Importance-AwareWeight Transfer (NIWT), learns to map domain knowledge about novel "unseen" classes onto this dictionary of learned concepts and then optimizes for network parameters that can effectively combine these concepts - essentially learning classifiers by discovering and composing learned semantic concepts in deep networks.
2 code implementations • 22 Nov 2016 • Ramprasaath R. Selvaraju, Abhishek Das, Ramakrishna Vedantam, Michael Cogswell, Devi Parikh, Dhruv Batra
We propose a technique for making Convolutional Neural Network (CNN)-based models more transparent by visualizing input regions that are 'important' for predictions -- or visual explanations.
124 code implementations • ICCV 2017 • Ramprasaath R. Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, Dhruv Batra
For captioning and VQA, we show that even non-attention based models can localize inputs.
1 code implementation • CVPR 2017 • Prithvijit Chattopadhyay, Ramakrishna Vedantam, Ramprasaath R. Selvaraju, Dhruv Batra, Devi Parikh
In this work, we build dedicated models for counting designed to tackle the large variance in counts, appearances, and scales of objects found in natural scenes.
Ranked #1 on Object Counting on COCO count-test