no code implementations • 3 Apr 2024 • Gabriela Ben Melech Stan, Raanan Yehezkel Rohekar, Yaniv Gurwicz, Matthew Lyle Olson, Anahita Bhiwandiwalla, Estelle Aflalo, Chenfei Wu, Nan Duan, Shao-Yen Tseng, Vasudev Lal
In this work, we present a novel interactive application aimed towards understanding the internal mechanisms of large vision-language models.
no code implementations • 29 Mar 2024 • Phillip Howard, Anahita Bhiwandiwalla, Kathleen C. Fraser, Svetlana Kiritchenko
We comprehensively evaluate the text produced by different LVLMs under this counterfactual generation setting and find that social attributes such as race, gender, and physical characteristics depicted in input images can significantly influence toxicity and the generation of competency-associated words.
1 code implementation • 30 Nov 2023 • Phillip Howard, Avinash Madasu, Tiep Le, Gustavo Lujan Moreno, Anahita Bhiwandiwalla, Vasudev Lal
Our approach utilizes Stable Diffusion with cross attention control to produce sets of counterfactual image-text pairs that are highly similar in their depiction of a subject (e. g., a given occupation) while differing only in their depiction of intersectional social attributes (e. g., race & gender).
1 code implementation • 7 Oct 2023 • Avinash Madasu, Anahita Bhiwandiwalla, Vasudev Lal
We investigate 9 foundational image-text models on a diverse set of video tasks that include video action recognition (video AR), video retrieval (video RT), video question answering (video QA), video multiple choice (video MC) and video captioning (video CP).
1 code implementation • 31 May 2023 • Xiao Xu, Bei Li, Chenfei Wu, Shao-Yen Tseng, Anahita Bhiwandiwalla, Shachar Rosenman, Vasudev Lal, Wanxiang Che, Nan Duan
With only 4M VLP data, ManagerTower achieves superior performances on various downstream VL tasks, especially 79. 15% accuracy on VQAv2 Test-Std, 86. 56% IR@1 and 95. 64% TR@1 on Flickr30K.
no code implementations • ICLR 2020 • Léopold Cambier, Anahita Bhiwandiwalla, Ting Gong, Mehran Nekuii, Oguz H. Elibol, Hanlin Tang
This necessitates increased memory footprint and computational requirements for training.
no code implementations • 4 Oct 2019 • Kara Lamb, Garima Malhotra, Athanasios Vlontzos, Edward Wagstaff, Atılım Günes Baydin, Anahita Bhiwandiwalla, Yarin Gal, Alfredo Kalaitzis, Anthony Reina, Asti Bhatt
High energy particles originating from solar activity travel along the the Earth's magnetic field and interact with the atmosphere around the higher latitudes.
no code implementations • 3 Oct 2019 • Kara Lamb, Garima Malhotra, Athanasios Vlontzos, Edward Wagstaff, Atılım Günes Baydin, Anahita Bhiwandiwalla, Yarin Gal, Alfredo Kalaitzis, Anthony Reina, Asti Bhatt
We propose a novel architecture and loss function to predict 1 hour in advance the magnitude of phase scintillations within a time window of plus-minus 5 minutes with state-of-the-art performance.
no code implementations • ICLR Workshop LLD 2019 • Subarna Tripathi, Anahita Bhiwandiwalla, Alexei Bastidas, Hanlin Tang
Existing scene graph to image models have two stages: (1) a scene composition stage, and an (2) image generation stage.
no code implementations • 11 Jan 2019 • Subarna Tripathi, Anahita Bhiwandiwalla, Alexei Bastidas, Hanlin Tang
Generating realistic images from scene graphs asks neural networks to be able to reason about object relationships and compositionality.
Image Generation from Scene Graphs Open-Ended Question Answering +1
1 code implementation • 24 Jan 2018 • Scott Cyphers, Arjun K. Bansal, Anahita Bhiwandiwalla, Jayaram Bobba, Matthew Brookhart, Avijit Chakraborty, Will Constable, Christian Convey, Leona Cook, Omar Kanawi, Robert Kimball, Jason Knight, Nikolay Korovaiko, Varun Kumar, Yixing Lao, Christopher R. Lishka, Jaikrishnan Menon, Jennifer Myers, Sandeep Aswath Narayana, Adam Procter, Tristan J. Webb
The current approach, which we call "direct optimization", requires deep changes within each framework to improve the training performance for each hardware backend (CPUs, GPUs, FPGAs, ASICs) and requires $\mathcal{O}(fp)$ effort; where $f$ is the number of frameworks and $p$ is the number of platforms.