1 code implementation • NAACL (ACL) 2022 • Xinya Du, Zixuan Zhang, Sha Li, Pengfei Yu, Hongwei Wang, Tuan Lai, Xudong Lin, Ziqi Wang, Iris Liu, Ben Zhou, Haoyang Wen, Manling Li, Darryl Hannan, Jie Lei, Hyounghun Kim, Rotem Dror, Haoyu Wang, Michael Regan, Qi Zeng, Qing Lyu, Charles Yu, Carl Edwards, Xiaomeng Jin, Yizhu Jiao, Ghazaleh Kazeminejad, Zhenhailong Wang, Chris Callison-Burch, Mohit Bansal, Carl Vondrick, Jiawei Han, Dan Roth, Shih-Fu Chang, Martha Palmer, Heng Ji
We introduce RESIN-11, a new schema-guided event extraction&prediction framework that can be applied to a large variety of newsworthy scenarios.
1 code implementation • ACL 2022 • Xingyu Fu, Ben Zhou, Ishaan Chandratreya, Carl Vondrick, Dan Roth
Images are often more significant than only the pixels to human eyes, as we can infer, associate, and reason with contextual information from other sources to establish a more complete picture.
no code implementations • 23 May 2024 • Basile Van Hoorick, Rundi Wu, Ege Ozguroglu, Kyle Sargent, Ruoshi Liu, Pavel Tokmakov, Achal Dave, Changxi Zheng, Carl Vondrick
Accurate reconstruction of complex dynamic scenes from just a single viewpoint continues to be a challenging task in computer vision.
no code implementations • 15 Apr 2024 • Mia Chiquier, Utkarsh Mall, Carl Vondrick
To address these limitations, we present a novel method that discovers interpretable yet discriminative sets of attributes for visual recognition.
1 code implementation • 16 Mar 2024 • Haozhe Chen, Carl Vondrick, Chengzhi Mao
How do large language models (LLMs) obtain their answers?
1 code implementation • 15 Feb 2024 • Abdullah Hamdi, Luke Melas-Kyriazi, Jinjie Mai, Guocheng Qian, Ruoshi Liu, Carl Vondrick, Bernard Ghanem, Andrea Vedaldi
With the aid of a frequency-modulated loss, GES achieves competitive performance in novel-view synthesis benchmarks while requiring less than half the memory storage of Gaussian Splatting and increasing the rendering speed by up to 39%.
1 code implementation • 25 Jan 2024 • Ege Ozguroglu, Ruoshi Liu, Dídac Surís, Dian Chen, Achal Dave, Pavel Tokmakov, Carl Vondrick
We introduce pix2gestalt, a framework for zero-shot amodal segmentation, which learns to estimate the shape and appearance of whole objects that are only partially visible behind occlusions.
1 code implementation • 23 Jan 2024 • Chengzhi Mao, Carl Vondrick, Hao Wang, Junfeng Yang
We find that large language models (LLMs) are more likely to modify human-written text than AI-generated text when tasked with rewriting.
no code implementations • 12 Dec 2023 • Utkarsh Mall, Cheng Perng Phoo, Meilin Kelsey Liu, Carl Vondrick, Bharath Hariharan, Kavita Bala
We introduce a method to train vision-language models for remote-sensing images without using any textual annotations.
1 code implementation • 16 Oct 2023 • Haozhe Chen, Junfeng Yang, Carl Vondrick, Chengzhi Mao
Large-scale pre-trained vision foundation models, such as CLIP, have become de facto backbones for various vision tasks.
no code implementations • ICCV 2023 • Hongge Chen, Zhao Chen, Gregory P. Meyer, Dennis Park, Carl Vondrick, Ashish Shrivastava, Yuning Chai
We present SHIFT3D, a differentiable pipeline for generating 3D shapes that are structurally plausible yet challenging to 3D object detectors.
1 code implementation • NeurIPS 2023 • Sungduk Yu, Walter Hannah, Liran Peng, Jerry Lin, Mohamed Aziz Bhouri, Ritwik Gupta, Björn Lütjens, Justus Christopher Will, Gunnar Behrens, Julius Busecke, Nora Loose, Charles I Stern, Tom Beucler, Bryce Harrop, Benjamin R Hillman, Andrea Jenney, Savannah Ferretti, Nana Liu, Anima Anandkumar, Noah D Brenowitz, Veronika Eyring, Nicholas Geneva, Pierre Gentine, Stephan Mandt, Jaideep Pathak, Akshay Subramaniam, Carl Vondrick, Rose Yu, Laure Zanna, Tian Zheng, Ryan Abernathey, Fiaz Ahmed, David C Bader, Pierre Baldi, Elizabeth Barnes, Christopher Bretherton, Peter Caldwell, Wayne Chuang, Yilun Han, Yu Huang, Fernando Iglesias-Suarez, Sanket Jantre, Karthik Kashinath, Marat Khairoutdinov, Thorsten Kurth, Nicholas Lutsko, Po-Lun Ma, Griffin Mooers, J. David Neelin, David Randall, Sara Shamekh, Mark A Taylor, Nathan Urban, Janni Yuval, Guang Zhang, Michael Pritchard
The dataset is global in coverage, spans multiple years at high sampling frequency, and is designed such that resulting emulators are compatible with downstream coupling into operational climate simulators.
1 code implementation • 24 May 2023 • Rundi Wu, Ruoshi Liu, Carl Vondrick, Changxi Zheng
Specifically, we encode the input 3D textured shape into triplane feature maps that represent the signed distance and texture fields of the input.
1 code implementation • CVPR 2023 • Basile Van Hoorick, Pavel Tokmakov, Simon Stent, Jie Li, Carl Vondrick
Tracking objects with persistence in cluttered and dynamic environments remains a difficult challenge for computer vision systems.
no code implementations • CVPR 2023 • Ruoshi Liu, Carl Vondrick
The relatively hot temperature of the human body causes people to turn into long-wave infrared light sources.
no code implementations • ICCV 2023 • Arjun Mani, Ishaan Preetam Chandratreya, Elliot Creager, Carl Vondrick, Richard Zemel
Modeling the mechanics of fluid in complex scenes is vital to applications in design, graphics, and robotics.
1 code implementation • ICCV 2023 • Ruoshi Liu, Rundi Wu, Basile Van Hoorick, Pavel Tokmakov, Sergey Zakharov, Carl Vondrick
We introduce Zero-1-to-3, a framework for changing the camera viewpoint of an object given just a single RGB image.
1 code implementation • ICCV 2023 • Dídac Surís, Sachit Menon, Carl Vondrick
Answering visual queries is a complex task that requires both visual processing and reasoning.
Ranked #11 on Zero-Shot Video Question Answer on NExT-QA
1 code implementation • 26 Jan 2023 • Scott Geng, Revant Teotia, Purva Tendulkar, Sachit Menon, Carl Vondrick
We introduce a video framework for modeling the association between verbal and non-verbal communication during dyadic conversation.
no code implementations • CVPR 2023 • Ruoshi Liu, Sachit Menon, Chengzhi Mao, Dennis Park, Simon Stent, Carl Vondrick
Experiments and visualizations show that the method is able to generate multiple possible solutions that are consistent with the observation of the shadow.
2 code implementations • 14 Dec 2022 • Chengzhi Mao, Scott Geng, Junfeng Yang, Xin Wang, Carl Vondrick
We apply this training loss to two adaption methods, model finetuning and visual prompt tuning.
no code implementations • 13 Dec 2022 • Lingyu Zhang, Chengzhi Mao, Junfeng Yang, Carl Vondrick
Even under adaptive attacks where the adversary knows our defense, our algorithm is still effective.
1 code implementation • 12 Dec 2022 • Chengzhi Mao, Lingyu Zhang, Abhishek Joshi, Junfeng Yang, Hao Wang, Carl Vondrick
In this paper, we introduce a framework that uses the dense intrinsic constraints in natural images to robustify inference.
1 code implementation • CVPR 2023 • Chengzhi Mao, Revant Teotia, Amrutha Sundar, Sachit Menon, Junfeng Yang, Xin Wang, Carl Vondrick
We propose a ``doubly right'' object recognition benchmark, where the metric requires the model to simultaneously produce both the right labels as well as the right rationales.
no code implementations • 8 Dec 2022 • Sachit Menon, Ishaan Preetam Chandratreya, Carl Vondrick
Incidental supervision from language has become a popular approach for learning generic visual representations that can be prompted to perform many recognition tasks in computer vision.
no code implementations • ICCV 2023 • Mia Chiquier, Carl Vondrick
The dataset consists of 12. 5 hours of synchronized video and surface electromyography (sEMG) data of 10 subjects performing various exercises.
no code implementations • 2 Dec 2022 • Hui Lu, Mia Chiquier, Carl Vondrick
We introduce a framework for navigating through cluttered environments by connecting multiple cameras together while simultaneously preserving privacy.
no code implementations • CVPR 2023 • Purva Tendulkar, Dídac Surís, Carl Vondrick
Towards this goal, we address the task of generating a virtual human -- hands and full body -- grasping everyday objects.
3 code implementations • 13 Oct 2022 • Sachit Menon, Carl Vondrick
By basing decisions on these descriptors, we can provide additional cues that encourage using the features we want to be used.
no code implementations • 4 Oct 2022 • Dídac Surís, Carl Vondrick
We introduce a representation learning framework for spatial trajectories.
no code implementations • 19 Jul 2022 • Sachit Menon, David Blei, Carl Vondrick
Variational autoencoders (VAEs) suffer from posterior collapse, where the powerful neural networks used for modeling and inference optimize the objective without meaningfully using the latent representation.
no code implementations • 17 Jun 2022 • Ruoshi Liu, Sachit Menon, Chengzhi Mao, Dennis Park, Simon Stent, Carl Vondrick
Experiments and visualizations show that the method is able to generate multiple possible solutions that are consistent with the observation of the shadow.
no code implementations • ICCV 2023 • Ruoshi Liu, Chengzhi Mao, Purva Tendulkar, Hao Wang, Carl Vondrick
Many machine learning methods operate by inverting a neural network at inference time, which has become a popular technique for solving inverse problems in computer vision, robotics, and graphics.
no code implementations • CVPR 2022 • Didac Suris, Carl Vondrick, Bryan Russell, Justin Salamon
In order to capture the high-level concepts that are required to solve the task, we propose modeling the long-term temporal context of both the video and the music signals, using Transformer networks for each modality.
1 code implementation • CVPR 2022 • Chengzhi Mao, Kevin Xia, James Wang, Hao Wang, Junfeng Yang, Elias Bareinboim, Carl Vondrick
Visual representations underlie object recognition tasks, but they often contain both robust and non-robust features.
no code implementations • CVPR 2022 • Basile Van Hoorick, Purva Tendulka, Didac Suris, Dennis Park, Simon Stent, Carl Vondrick
For computer vision systems to operate in dynamic situations, they need to be able to represent and reason about object permanence.
1 code implementation • 1 Mar 2022 • Xingyu Fu, Ben Zhou, Ishaan Preetam Chandratreya, Carl Vondrick, Dan Roth
For example, in Figure 1, we can find a way to identify the news articles related to the picture through segment-wise understandings of the signs, the buildings, the crowds, and more.
1 code implementation • CVPR 2022 • Will Price, Carl Vondrick, Dima Damen
Our lives can be seen as a complex weaving of activities; we switch from one activity to another, to maximise our achievements or in reaction to demands placed upon us.
no code implementations • ICLR 2022 • Mia Chiquier, Chengzhi Mao, Carl Vondrick
Automatic speech recognition systems have created exciting possibilities for applications, however they also enable opportunities for systematic eavesdropping.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
1 code implementation • ICLR 2022 • Chengzhi Mao, Lu Jiang, Mostafa Dehghani, Carl Vondrick, Rahul Sukthankar, Irfan Essa
Vision Transformer (ViT) is emerging as the state-of-the-art architecture for image recognition.
Ranked #3 on Domain Generalization on Stylized-ImageNet
1 code implementation • 11 Nov 2021 • Boyuan Chen, Robert Kwiatkowski, Carl Vondrick, Hod Lipson
Internal computational models of physical bodies are fundamental to the ability of robots and animals alike to plan and control their actions.
1 code implementation • CVPR 2021 • Dídac Surís, Ruoshi Liu, Carl Vondrick
We introduce a framework for learning from unlabeled video what is predictable in the future.
Representation Learning Self-Supervised Action Recognition +1
1 code implementation • NAACL 2021 • Haoyang Wen, Ying Lin, Tuan Lai, Xiaoman Pan, Sha Li, Xudong Lin, Ben Zhou, Manling Li, Haoyu Wang, Hongming Zhang, Xiaodong Yu, Alexander Dong, Zhenhailong Wang, Yi Fung, Piyush Mishra, Qing Lyu, D{\'\i}dac Sur{\'\i}s, Brian Chen, Susan Windisch Brown, Martha Palmer, Chris Callison-Burch, Carl Vondrick, Jiawei Han, Dan Roth, Shih-Fu Chang, Heng Ji
We present a new information extraction system that can automatically construct temporal event graphs from a collection of news documents from multiple sources, multiple languages (English and Spanish for our experiment), and multiple data modalities (speech, text, image and video).
1 code implementation • 17 May 2021 • Boyuan Chen, Mia Chiquier, Hod Lipson, Carl Vondrick
Due to the many ways that robots use containers, we believe the box will have a number of applications in robotics.
1 code implementation • ICCV 2021 • Chengzhi Mao, Mia Chiquier, Hao Wang, Junfeng Yang, Carl Vondrick
We find that images contain intrinsic structure that enables the reversal of many adversarial attacks.
no code implementations • 1 Jan 2021 • Bo Wu, Haoyu Qin, Alireza Zareian, Carl Vondrick, Shih-Fu Chang
Children acquire language subconsciously by observing the surrounding world and listening to descriptions.
1 code implementation • CVPR 2021 • Chengzhi Mao, Augustine Cha, Amogh Gupta, Hao Wang, Junfeng Yang, Carl Vondrick
We introduce a framework for learning robust visual representations that generalize to new viewpoints, backgrounds, and scene contexts.
Ranked #44 on Image Classification on ObjectNet (using extra training data)
1 code implementation • CVPR 2022 • Dídac Surís, Dave Epstein, Carl Vondrick
Machine translation between many languages at once is highly challenging, since training with ground truth requires supervision between all language pairs, which is difficult to obtain.
1 code implementation • ICCV 2021 • Basile Van Hoorick, Carl Vondrick
The elementary operation of cropping underpins nearly every computer vision system, ranging from data augmentation and translation invariance to computational photography and representation learning.
1 code implementation • NeurIPS 2020 • Ruilin Xu, Rundi Wu, Yuko Ishiwaka, Carl Vondrick, Changxi Zheng
We introduce a deep learning model for speech denoising, a long-standing challenge in audio analysis arising in numerous applications.
1 code implementation • ECCV 2020 • Alex Andonian, Camilo Fosco, Mathew Monfort, Allen Lee, Rogerio Feris, Carl Vondrick, Aude Oliva
This allows our model to perform cognitive tasks such as set abstraction (which general concept is in common among a set of videos?
no code implementations • 22 Jul 2020 • Bo Wu, Haoyu Qin, Alireza Zareian, Carl Vondrick, Shih-Fu Chang
Children acquire language subconsciously by observing the surrounding world and listening to descriptions.
1 code implementation • ECCV 2020 • Chengzhi Mao, Amogh Gupta, Vikram Nitin, Baishakhi Ray, Shuran Song, Junfeng Yang, Carl Vondrick
Although deep networks achieve strong accuracy on a range of computer vision benchmarks, they remain vulnerable to adversarial attacks, where imperceptible input perturbations fool the network.
no code implementations • CVPR 2021 • Dave Epstein, Carl Vondrick
We introduce a framework that predicts the goals behind observable human action in video.
1 code implementation • ECCV 2020 • Dídac Surís, Dave Epstein, Heng Ji, Shih-Fu Chang, Carl Vondrick
Language acquisition is the process of learning words from the surrounding scene.
1 code implementation • CVPR 2020 • Dave Epstein, Boyuan Chen, Carl Vondrick
We train a supervised neural network as a baseline and analyze its performance compared to human consistency on the tasks.
no code implementations • 15 Oct 2019 • Boyuan Chen, Shuran Song, Hod Lipson, Carl Vondrick
We train embodied agents to play Visual Hide and Seek where a prey must navigate in a simulated environment in order to avoid capture from a predator.
1 code implementation • NeurIPS 2019 • Chengzhi Mao, Ziyuan Zhong, Junfeng Yang, Carl Vondrick, Baishakhi Ray
Deep networks are well-known to be fragile to adversarial attacks.
no code implementations • ICLR 2019 • Te-Lin Wu, Jaedong Hwang, Jingyun Yang, Shaofan Lai, Carl Vondrick, Joseph J. Lim
A noisy and diverse demonstration set may hinder the performances of an agent aiming to acquire certain skills via imitation learning.
no code implementations • CVPR 2019 • Chen Sun, Abhinav Shrivastava, Carl Vondrick, Rahul Sukthankar, Kevin Murphy, Cordelia Schmid
This paper focuses on multi-person action forecasting in videos.
3 code implementations • ICCV 2019 • Chen Sun, Austin Myers, Carl Vondrick, Kevin Murphy, Cordelia Schmid
Self-supervised learning has become increasingly important to leverage the abundance of unlabeled data available on platforms like YouTube.
Ranked #1 on Action Classification on YouCook2
1 code implementation • CVPR 2019 • Hassan Akbari, Svebor Karaman, Surabhi Bhargava, Brian Chen, Carl Vondrick, Shih-Fu Chang
Following dedicated non-linear mappings for visual features at each level, word, and sentence embeddings, we obtain multiple instantiations of our common semantic space in which comparisons between any target text and the visual content is performed with cosine similarity.
Ranked #1 on Phrase Grounding on ReferIt
1 code implementation • ECCV 2018 • Chen Sun, Abhinav Shrivastava, Carl Vondrick, Kevin Murphy, Rahul Sukthankar, Cordelia Schmid
A visualization of the learned relation features confirms that our approach is able to attend to the relevant relations for each action.
Ranked #15 on Action Recognition on AVA v2.1
1 code implementation • ECCV 2018 • Carl Vondrick, Abhinav Shrivastava, Alireza Fathi, Sergio Guadarrama, Kevin Murphy
We use large amounts of unlabeled video to learn models for visual tracking without manual human supervision.
2 code implementations • ECCV 2018 • Hang Zhao, Chuang Gan, Andrew Rouditchenko, Carl Vondrick, Josh Mcdermott, Antonio Torralba
We introduce PixelPlayer, a system that, by leveraging large amounts of unlabeled videos, learns to locate image regions which produce sounds and separate the input sounds into a set of components that represents the sound from each pixel.
4 code implementations • 9 Jan 2018 • Mathew Monfort, Alex Andonian, Bolei Zhou, Kandan Ramakrishnan, Sarah Adel Bargal, Tom Yan, Lisa Brown, Quanfu Fan, Dan Gutfruend, Carl Vondrick, Aude Oliva
We present the Moments in Time Dataset, a large-scale human-annotated collection of one million short videos corresponding to dynamic events unfolding within three seconds.
no code implementations • ICLR 2018 • Deniz Oktay, Carl Vondrick, Antonio Torralba
However, when a layer is removed, the model learns to produce a different image that still looks natural to an adversary, which is possible by removing objects.
no code implementations • 5 Dec 2017 • Kexin Pei, Linjie Zhu, Yinzhi Cao, Junfeng Yang, Carl Vondrick, Suman Jana
Finally, we show that retraining using the safety violations detected by VeriVis can reduce the average number of violations up to 60. 2%.
no code implementations • ICCV 2017 • Adria Recasens, Carl Vondrick, Aditya Khosla, Antonio Torralba
In this paper, we present an approach for following gaze in video by predicting where a person (in the video) is looking even when the object is in a different frame.
no code implementations • CVPR 2017 • Carl Vondrick, Antonio Torralba
We present a model that generates the future by transforming pixels in the past.
1 code implementation • 3 Jun 2017 • Yusuf Aytar, Carl Vondrick, Antonio Torralba
We capitalize on large amounts of readily-available, synchronous data to learn a deep discriminative representations shared across three major natural modalities: vision, sound and language.
8 code implementations • CVPR 2018 • Chunhui Gu, Chen Sun, David A. Ross, Carl Vondrick, Caroline Pantofaru, Yeqing Li, Sudheendra Vijayanarasimhan, George Toderici, Susanna Ricco, Rahul Sukthankar, Cordelia Schmid, Jitendra Malik
The AVA dataset densely annotates 80 atomic visual actions in 430 15-minute video clips, where actions are localized in space and time, resulting in 1. 58M action labels with multiple labels per person occurring frequently.
Ranked #6 on Action Detection on UCF101-24
no code implementations • 9 Dec 2016 • Adrià Recasens, Carl Vondrick, Aditya Khosla, Antonio Torralba
In this paper, we present an approach for following gaze across views by predicting where a particular person is looking throughout a scene.
no code implementations • 4 Dec 2016 • Benjamin Eysenbach, Carl Vondrick, Antonio Torralba
We then create a representation of characters' beliefs for two tasks in human action understanding: predicting who is mistaken, and when they are mistaken.
no code implementations • 27 Oct 2016 • Yusuf Aytar, Lluis Castrejon, Carl Vondrick, Hamed Pirsiavash, Antonio Torralba
Our experiments suggest that our scene representation can help transfer representations across modalities for retrieval.
6 code implementations • NeurIPS 2016 • Yusuf Aytar, Carl Vondrick, Antonio Torralba
We learn rich natural sound representations by capitalizing on large amounts of unlabeled sound data collected in the wild.
no code implementations • NeurIPS 2016 • Carl Vondrick, Hamed Pirsiavash, Antonio Torralba
We capitalize on large amounts of unlabeled video in order to learn a model of scene dynamics for both video recognition tasks (e. g. action classification) and video generation tasks (e. g. future prediction).
no code implementations • CVPR 2016 • Lluis Castrejon, Yusuf Aytar, Carl Vondrick, Hamed Pirsiavash, Antonio Torralba
Our experiments suggest that our scene representation can help transfer representations across modalities for retrieval.
no code implementations • NeurIPS 2015 • Adria Recasens, Aditya Khosla, Carl Vondrick, Antonio Torralba
Humans have the remarkable ability to follow the gaze of other people to identify what they are looking at.
no code implementations • CVPR 2016 • Carl Vondrick, Hamed Pirsiavash, Antonio Torralba
The key idea behind our approach is that we can train deep networks to predict the visual representation of images in the future.
no code implementations • 5 Mar 2015 • Xiangxin Zhu, Carl Vondrick, Charless Fowlkes, Deva Ramanan
Datasets for training object recognition systems are steadily increasing in size.
1 code implementation • 19 Feb 2015 • Carl Vondrick, Aditya Khosla, Hamed Pirsiavash, Tomasz Malisiewicz, Antonio Torralba
We introduce algorithms to visualize feature spaces used by object detectors.
no code implementations • NeurIPS 2015 • Carl Vondrick, Hamed Pirsiavash, Aude Oliva, Antonio Torralba
Although the human visual system can recognize many concepts under challenging conditions, it still has some biases.
no code implementations • CVPR 2016 • Carl Vondrick, Deniz Oktay, Hamed Pirsiavash, Antonio Torralba
In this paper, we introduce the problem of predicting why a person has performed an action in images.
no code implementations • 11 Dec 2012 • Carl Vondrick, Aditya Khosla, Tomasz Malisiewicz, Antonio Torralba
By visualizing feature spaces, we can gain a more intuitive understanding of our detection systems.
no code implementations • NeurIPS 2011 • Carl Vondrick, Deva Ramanan
We introduce a novel active learning framework for video annotation.