1 code implementation • 5 Feb 2024 • Xudong Wang, Trevor Darrell, Sai Saketh Rambhatla, Rohit Girdhar, Ishan Misra
Text-to-image diffusion models produce high quality images but do not offer control over individual instances in the image.
Ranked #3 on Conditional Text-to-Image Synthesis on COCO-MIG
no code implementations • 17 Nov 2023 • Rohit Girdhar, Mannat Singh, Andrew Brown, Quentin Duval, Samaneh Azadi, Sai Saketh Rambhatla, Akbar Shah, Xi Yin, Devi Parikh, Ishan Misra
We present Emu Video, a text-to-video generation model that factorizes the generation into two steps: first generating an image conditioned on the text, and then generating a video conditioned on the text and the generated image.
no code implementations • 17 Nov 2023 • Sai Saketh Rambhatla, Ishan Misra
In this work, we show that text-to-image generative models can be 'inverted' to assess their own text-image understanding capabilities in a completely automated manner.
Ranked #62 on Visual Reasoning on Winoground
no code implementations • ICCV 2023 • Sai Saketh Rambhatla, Ishan Misra, Rama Chellappa, Abhinav Shrivastava
In this work, we present Multiple Object localization with Self-supervised Transformers (MOST) that uses features of transformers trained using self-supervised learning to localize multiple objects in real world images.
no code implementations • ICCV 2023 • Saksham Suri, Sai Saketh Rambhatla, Rama Chellappa, Abhinav Shrivastava
On average, we improve by $2. 6$, $3. 9$ and $9. 6$ mAP over previous state-of-the-art methods on three splits of increasing sparsity on COCO.
no code implementations • 26 Oct 2021 • Steven Schwarcz, Sai Saketh Rambhatla, Rama Chellappa
This architecture, which we call a Self-Denoising Neural Network (SDNN), can be applied easily to most modern convolutional neural architectures, and can be used as a supplement to many existing few-shot learning techniques.
no code implementations • 28 Jul 2021 • Sai Saketh Rambhatla, Michael Jones, Rama Chellappa
Boosting is a method for finding a highly accurate hypothesis by linearly combining many ``weak" hypotheses, each of which may be only moderately accurate.
no code implementations • ICCV 2021 • Sai Saketh Rambhatla, Rama Chellappa, Abhinav Shrivastava
We tackle object category discovery, which is the problem of discovering and localizing novel objects in a large unlabeled dataset.
no code implementations • 9 Apr 2020 • Ankan Bansal, Sai Saketh Rambhatla, Abhinav Shrivastava, Rama Chellappa
The proposed method consists of a layout module which primes a visual module to predict the type of interaction between a human and an object.
1 code implementation • ICCV 2019 • Pirazh Khorramshahi, Amit Kumar, Neehar Peri, Sai Saketh Rambhatla, Jun-Cheng Chen, Rama Chellappa
In this paper, we present a novel dual-path adaptive attention model for vehicle re-identification (AAVER).
Vehicle Key-Point and Orientation Estimation Vehicle Re-Identification
no code implementations • 5 Apr 2019 • Ankan Bansal, Sai Saketh Rambhatla, Abhinav Shrivastava, Rama Chellappa
We present an approach for detecting human-object interactions (HOIs) in images, based on the idea that humans interact with functionally similar objects in a similar manner.