no code implementations • 21 Mar 2024 • Saksham Suri, Matthew Walmer, Kamal Gupta, Abhinav Shrivastava
We present a simple self-supervised method to enhance the performance of ViT features for dense downstream tasks.
no code implementations • 7 Dec 2023 • Saksham Suri, Fanyi Xiao, Animesh Sinha, Sean Chang Culatana, Raghuraman Krishnamoorthi, Chenchen Zhu, Abhinav Shrivastava
In the long-tailed detection setting on LVIS, Gen2Det improves the performance on rare categories by a large margin while also significantly improving the performance on other categories, e. g. we see an improvement of 2. 13 Box AP and 1. 84 Mask AP over just training on real data on LVIS with Mask R-CNN.
1 code implementation • 18 Aug 2023 • Soumik Mukhopadhyay, Saksham Suri, Ravi Teja Gadde, Abhinav Shrivastava
We show results on both reconstruction (same audio-video inputs) as well as cross (different audio-video inputs) settings on Voxceleb2 and LRW datasets.
1 code implementation • CVPR 2023 • Matthew Walmer, Saksham Suri, Kamal Gupta, Abhinav Shrivastava
We compare ViTs trained through different methods of supervision, and show that they learn a diverse range of behaviors in terms of their attention, representations, and downstream performance.
no code implementations • ICCV 2023 • Saksham Suri, Sai Saketh Rambhatla, Rama Chellappa, Abhinav Shrivastava
On average, we improve by $2. 6$, $3. 9$ and $9. 6$ mAP over previous state-of-the-art methods on three splits of increasing sparsity on COCO.
1 code implementation • ICCV 2021 • Sharath Girish, Saksham Suri, Saketh Rambhatla, Abhinav Shrivastava
Through extensive experiments, we show that our algorithm discovers unseen GANs with high accuracy and also generalizes to GANs trained on unseen real datasets.
no code implementations • ICCV 2021 • Moustafa Meshry, Saksham Suri, Larry S. Davis, Abhinav Shrivastava
In contrast, we propose to factorize the representation of a subject into its spatial and style components.
no code implementations • 18 Nov 2018 • Saksham Suri, Anush Sankaran, Mayank Vatsa, Richa Singh
In this paper, a novel framework is proposed which transfers fundamental visual features learnt from a generic image dataset to supplement a supervised face recognition model.
no code implementations • 11 Nov 2018 • Yao Zhu, Saksham Suri, Pranav Kulkarni, Yueru Chen, Jiali Duan, C. -C. Jay Kuo
An interpretable generative model for handwritten digits synthesis is proposed in this work.