Search Results for author: Carl Vondrick

Found 86 papers, 45 papers with code

RESIN-11: Schema-guided Event Prediction for 11 Newsworthy Scenarios

1 code implementation • NAACL (ACL) 2022 • Xinya Du, Zixuan Zhang, Sha Li, Pengfei Yu, Hongwei Wang, Tuan Lai, Xudong Lin, Ziqi Wang, Iris Liu, Ben Zhou, Haoyang Wen, Manling Li, Darryl Hannan, Jie Lei, Hyounghun Kim, Rotem Dror, Haoyu Wang, Michael Regan, Qi Zeng, Qing Lyu, Charles Yu, Carl Edwards, Xiaomeng Jin, Yizhu Jiao, Ghazaleh Kazeminejad, Zhenhailong Wang, Chris Callison-Burch, Mohit Bansal, Carl Vondrick, Jiawei Han, Dan Roth, Shih-Fu Chang, Martha Palmer, Heng Ji

We introduce RESIN-11, a new schema-guided event extraction&prediction framework that can be applied to a large variety of newsworthy scenarios.

Event Extraction

Paper
Code

There’s a Time and Place for Reasoning Beyond the Image

1 code implementation • ACL 2022 • Xingyu Fu, Ben Zhou, Ishaan Chandratreya, Carl Vondrick, Dan Roth

Images are often more significant than only the pixels to human eyes, as we can infer, associate, and reason with contextual information from other sources to establish a more complete picture.

16k Image Clustering

Paper
Code

Generative Camera Dolly: Extreme Monocular Dynamic Novel View Synthesis

no code implementations • 23 May 2024 • Basile Van Hoorick, Rundi Wu, Ege Ozguroglu, Kyle Sargent, Ruoshi Liu, Pavel Tokmakov, Achal Dave, Changxi Zheng, Carl Vondrick

Accurate reconstruction of complex dynamic scenes from just a single viewpoint continues to be a challenging task in computer vision.

Novel View Synthesis Scene Understanding

Paper
Add Code

Evolving Interpretable Visual Classifiers with Large Language Models

no code implementations • 15 Apr 2024 • Mia Chiquier, Utkarsh Mall, Carl Vondrick

To address these limitations, we present a novel method that discovers interpretable yet discriminative sets of attributes for visual recognition.

In-Context Learning Language Modelling +2

Paper
Add Code

SelfIE: Self-Interpretation of Large Language Model Embeddings

1 code implementation • 16 Mar 2024 • Haozhe Chen, Carl Vondrick, Chengzhi Mao

How do large language models (LLMs) obtain their answers?

Language Modelling Large Language Model

Paper
Code

GES: Generalized Exponential Splatting for Efficient Radiance Field Rendering

1 code implementation • 15 Feb 2024 • Abdullah Hamdi, Luke Melas-Kyriazi, Jinjie Mai, Guocheng Qian, Ruoshi Liu, Carl Vondrick, Bernard Ghanem, Andrea Vedaldi

With the aid of a frequency-modulated loss, GES achieves competitive performance in novel-view synthesis benchmarks while requiring less than half the memory storage of Gaussian Splatting and increasing the rendering speed by up to 39%.

3D Reconstruction Novel View Synthesis

169

Paper
Code

pix2gestalt: Amodal Segmentation by Synthesizing Wholes

1 code implementation • 25 Jan 2024 • Ege Ozguroglu, Ruoshi Liu, Dídac Surís, Dian Chen, Achal Dave, Pavel Tokmakov, Carl Vondrick

We introduce pix2gestalt, a framework for zero-shot amodal segmentation, which learns to estimate the shape and appearance of whole objects that are only partially visible behind occlusions.

3D Reconstruction Object Recognition +1

101

Paper
Code

Raidar: geneRative AI Detection viA Rewriting

1 code implementation • 23 Jan 2024 • Chengzhi Mao, Carl Vondrick, Hao Wang, Junfeng Yang

We find that large language models (LLMs) are more likely to modify human-written text than AI-generated text when tasked with rewriting.

Paper
Code

Remote Sensing Vision-Language Foundation Models without Annotations via Ground Remote Alignment

no code implementations • 12 Dec 2023 • Utkarsh Mall, Cheng Perng Phoo, Meilin Kelsey Liu, Carl Vondrick, Bharath Hariharan, Kavita Bala

We introduce a method to train vision-language models for remote-sensing images without using any textual annotations.

Image Classification Language Modelling +4

Paper
Add Code

Interpreting and Controlling Vision Foundation Models via Text Explanations

1 code implementation • 16 Oct 2023 • Haozhe Chen, Junfeng Yang, Carl Vondrick, Chengzhi Mao

Large-scale pre-trained vision foundation models, such as CLIP, have become de facto backbones for various vision tasks.

Model Editing Visual Reasoning

Paper
Code

SHIFT3D: Synthesizing Hard Inputs For Tricking 3D Detectors

no code implementations • ICCV 2023 • Hongge Chen, Zhao Chen, Gregory P. Meyer, Dennis Park, Carl Vondrick, Ashish Shrivastava, Yuning Chai

We present SHIFT3D, a differentiable pipeline for generating 3D shapes that are structurally plausible yet challenging to 3D object detectors.

Autonomous Driving Object

Paper
Add Code

ClimSim: A large multi-scale dataset for hybrid physics-ML climate emulation

1 code implementation • NeurIPS 2023 • Sungduk Yu, Walter Hannah, Liran Peng, Jerry Lin, Mohamed Aziz Bhouri, Ritwik Gupta, Björn Lütjens, Justus Christopher Will, Gunnar Behrens, Julius Busecke, Nora Loose, Charles I Stern, Tom Beucler, Bryce Harrop, Benjamin R Hillman, Andrea Jenney, Savannah Ferretti, Nana Liu, Anima Anandkumar, Noah D Brenowitz, Veronika Eyring, Nicholas Geneva, Pierre Gentine, Stephan Mandt, Jaideep Pathak, Akshay Subramaniam, Carl Vondrick, Rose Yu, Laure Zanna, Tian Zheng, Ryan Abernathey, Fiaz Ahmed, David C Bader, Pierre Baldi, Elizabeth Barnes, Christopher Bretherton, Peter Caldwell, Wayne Chuang, Yilun Han, Yu Huang, Fernando Iglesias-Suarez, Sanket Jantre, Karthik Kashinath, Marat Khairoutdinov, Thorsten Kurth, Nicholas Lutsko, Po-Lun Ma, Griffin Mooers, J. David Neelin, David Randall, Sara Shamekh, Mark A Taylor, Nathan Urban, Janni Yuval, Guang Zhang, Michael Pritchard

The dataset is global in coverage, spans multiple years at high sampling frequency, and is designed such that resulting emulators are compatible with downstream coupling into operational climate simulators.

117

Paper
Code

Sin3DM: Learning a Diffusion Model from a Single 3D Textured Shape

1 code implementation • 24 May 2023 • Rundi Wu, Ruoshi Liu, Carl Vondrick, Changxi Zheng

Specifically, we encode the input 3D textured shape into triplane feature maps that represent the signed distance and texture fields of the input.

Denoising

120

Paper
Code

Tracking through Containers and Occluders in the Wild

1 code implementation • CVPR 2023 • Basile Van Hoorick, Pavel Tokmakov, Simon Stent, Jie Li, Carl Vondrick

Tracking objects with persistence in cluttered and dynamic environments remains a difficult challenge for computer vision systems.

Visual Tracking

Paper
Code

Humans as Light Bulbs: 3D Human Reconstruction from Thermal Reflection

no code implementations • CVPR 2023 • Ruoshi Liu, Carl Vondrick

The relatively hot temperature of the human body causes people to turn into long-wave infrared light sources.

3D Human Reconstruction Position

Paper
Add Code

SURFSUP: Learning Fluid Simulation for Novel Surfaces

no code implementations • ICCV 2023 • Arjun Mani, Ishaan Preetam Chandratreya, Elliot Creager, Carl Vondrick, Richard Zemel

Modeling the mechanics of fluid in complex scenes is vital to applications in design, graphics, and robotics.

Paper
Add Code

Zero-1-to-3: Zero-shot One Image to 3D Object

1 code implementation • ICCV 2023 • Ruoshi Liu, Rundi Wu, Basile Van Hoorick, Pavel Tokmakov, Sergey Zakharov, Carl Vondrick

We introduce Zero-1-to-3, a framework for changing the camera viewpoint of an object given just a single RGB image.

3D Reconstruction Image to 3D +3

2,528

Paper
Code

ViperGPT: Visual Inference via Python Execution for Reasoning

1 code implementation • ICCV 2023 • Dídac Surís, Sachit Menon, Carl Vondrick

Answering visual queries is a complex task that requires both visual processing and reasoning.

Ranked #11 on Zero-Shot Video Question Answer on NExT-QA

Code Generation Zero-Shot Video Question Answer

1,621

Paper
Code

Affective Faces for Goal-Driven Dyadic Communication

1 code implementation • 26 Jan 2023 • Scott Geng, Revant Teotia, Purva Tendulkar, Sachit Menon, Carl Vondrick

We introduce a video framework for modeling the association between verbal and non-verbal communication during dyadic conversation.

Paper
Code

What You Can Reconstruct From a Shadow

no code implementations • CVPR 2023 • Ruoshi Liu, Sachit Menon, Chengzhi Mao, Dennis Park, Simon Stent, Carl Vondrick

Experiments and visualizations show that the method is able to generate multiple possible solutions that are consistent with the observation of the shadow.

3D Reconstruction Object +1

Paper
Add Code

Understanding Zero-Shot Adversarial Robustness for Large-Scale Models

2 code implementations • 14 Dec 2022 • Chengzhi Mao, Scott Geng, Junfeng Yang, Xin Wang, Carl Vondrick

We apply this training loss to two adaption methods, model finetuning and visual prompt tuning.

Adversarial Robustness Contrastive Learning +1

Paper
Code

Adversarially Robust Video Perception by Seeing Motion

no code implementations • 13 Dec 2022 • Lingyu Zhang, Chengzhi Mao, Junfeng Yang, Carl Vondrick

Even under adaptive attacks where the adversary knows our defense, our algorithm is still effective.

Adversarial Robustness

Paper
Add Code

Robust Perception through Equivariance

1 code implementation • 12 Dec 2022 • Chengzhi Mao, Lingyu Zhang, Abhishek Joshi, Junfeng Yang, Hao Wang, Carl Vondrick

In this paper, we introduce a framework that uses the dense intrinsic constraints in natural images to robustify inference.

Adversarial Robustness Instance Segmentation +2

Paper
Code

Doubly Right Object Recognition: A Why Prompt for Visual Rationales

1 code implementation • CVPR 2023 • Chengzhi Mao, Revant Teotia, Amrutha Sundar, Sachit Menon, Junfeng Yang, Xin Wang, Carl Vondrick

We propose a ``doubly right'' object recognition benchmark, where the metric requires the model to simultaneously produce both the right labels as well as the right rationales.

Object Recognition

Paper
Code

Task Bias in Vision-Language Models

no code implementations • 8 Dec 2022 • Sachit Menon, Ishaan Preetam Chandratreya, Carl Vondrick

Incidental supervision from language has become a popular approach for learning generic visual representations that can be prompted to perform many recognition tasks in computer vision.

Paper
Add Code

Muscles in Action

no code implementations • ICCV 2023 • Mia Chiquier, Carl Vondrick

The dataset consists of 12. 5 hours of synchronized video and surface electromyography (sEMG) data of 10 subjects performing various exercises.

Paper
Add Code

Private Multiparty Perception for Navigation

no code implementations • 2 Dec 2022 • Hui Lu, Mia Chiquier, Carl Vondrick

We introduce a framework for navigating through cluttered environments by connecting multiple cameras together while simultaneously preserving privacy.

Paper
Add Code

FLEX: Full-Body Grasping Without Full-Body Grasps

no code implementations • CVPR 2023 • Purva Tendulkar, Dídac Surís, Carl Vondrick

Towards this goal, we address the task of generating a virtual human -- hands and full body -- grasping everyday objects.

Paper
Add Code

Visual Classification via Description from Large Language Models

3 code implementations • 13 Oct 2022 • Sachit Menon, Carl Vondrick

By basing decisions on these descriptors, we can provide additional cues that encourage using the features we want to be used.

Classification Descriptive +1

140

Paper
Code

Representing Spatial Trajectories as Distributions

no code implementations • 4 Oct 2022 • Dídac Surís, Carl Vondrick

We introduce a representation learning framework for spatial trajectories.

Representation Learning

Paper
Add Code

Forget-me-not! Contrastive Critics for Mitigating Posterior Collapse

no code implementations • 19 Jul 2022 • Sachit Menon, David Blei, Carl Vondrick

Variational autoencoders (VAEs) suffer from posterior collapse, where the powerful neural networks used for modeling and inference optimize the objective without meaningfully using the latent representation.

Contrastive Learning Representation Learning

Paper
Add Code

Shadows Shed Light on 3D Objects

no code implementations • 17 Jun 2022 • Ruoshi Liu, Sachit Menon, Chengzhi Mao, Dennis Park, Simon Stent, Carl Vondrick

Experiments and visualizations show that the method is able to generate multiple possible solutions that are consistent with the observation of the shadow.

3D Reconstruction Object +1

Paper
Add Code

Landscape Learning for Neural Network Inversion

no code implementations • ICCV 2023 • Ruoshi Liu, Chengzhi Mao, Purva Tendulkar, Hao Wang, Carl Vondrick

Many machine learning methods operate by inverting a neural network at inference time, which has become a popular technique for solving inverse problems in computer vision, robotics, and graphics.

Adversarial Defense

Paper
Add Code

It's Time for Artistic Correspondence in Music and Video

no code implementations • CVPR 2022 • Didac Suris, Carl Vondrick, Bryan Russell, Justin Salamon

In order to capture the high-level concepts that are required to solve the task, we propose modeling the long-term temporal context of both the video and the music signals, using Transformer networks for each modality.

Retrieval

Paper
Add Code

Causal Transportability for Visual Recognition

1 code implementation • CVPR 2022 • Chengzhi Mao, Kevin Xia, James Wang, Hao Wang, Junfeng Yang, Elias Bareinboim, Carl Vondrick

Visual representations underlie object recognition tasks, but they often contain both robust and non-robust features.

Image Classification Object Recognition +1

Paper
Code

Revealing Occlusions with 4D Neural Fields

no code implementations • CVPR 2022 • Basile Van Hoorick, Purva Tendulka, Didac Suris, Dennis Park, Simon Stent, Carl Vondrick

For computer vision systems to operate in dynamic situations, they need to be able to represent and reason about object permanence.

Video Understanding

Paper
Add Code

There is a Time and Place for Reasoning Beyond the Image

1 code implementation • 1 Mar 2022 • Xingyu Fu, Ben Zhou, Ishaan Preetam Chandratreya, Carl Vondrick, Dan Roth

For example, in Figure 1, we can find a way to identify the news articles related to the picture through segment-wise understandings of the signs, the buildings, the crowds, and more.

16k Image Clustering +1

Paper
Code

UnweaveNet: Unweaving Activity Stories

1 code implementation • CVPR 2022 • Will Price, Carl Vondrick, Dima Damen

Our lives can be seen as a complex weaving of activities; we switch from one activity to another, to maximise our achievements or in reaction to demands placed upon us.

Paper
Code

Real-Time Neural Voice Camouflage

no code implementations • ICLR 2022 • Mia Chiquier, Chengzhi Mao, Carl Vondrick

Automatic speech recognition systems have created exciting possibilities for applications, however they also enable opportunities for systematic eavesdropping.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Discrete Representations Strengthen Vision Transformer Robustness

1 code implementation • ICLR 2022 • Chengzhi Mao, Lu Jiang, Mostafa Dehghani, Carl Vondrick, Rahul Sukthankar, Irfan Essa

Vision Transformer (ViT) is emerging as the state-of-the-art architecture for image recognition.

Ranked #3 on Domain Generalization on Stylized-ImageNet

Domain Generalization Image Classification

312

Paper
Code

Full-Body Visual Self-Modeling of Robot Morphologies

1 code implementation • 11 Nov 2021 • Boyuan Chen, Robert Kwiatkowski, Carl Vondrick, Hod Lipson

Internal computational models of physical bodies are fundamental to the ability of robots and animals alike to plan and control their actions.

Motion Planning

Paper
Code

Learning the Predictability of the Future

1 code implementation • CVPR 2021 • Dídac Surís, Ruoshi Liu, Carl Vondrick

We introduce a framework for learning from unlabeled video what is predictable in the future.

Representation Learning Self-Supervised Action Recognition +1

158

Paper
Code

RESIN: A Dockerized Schema-Guided Cross-document Cross-lingual Cross-media Information Extraction and Event Tracking System

1 code implementation • NAACL 2021 • Haoyang Wen, Ying Lin, Tuan Lai, Xiaoman Pan, Sha Li, Xudong Lin, Ben Zhou, Manling Li, Haoyu Wang, Hongming Zhang, Xiaodong Yu, Alexander Dong, Zhenhailong Wang, Yi Fung, Piyush Mishra, Qing Lyu, D{\'\i}dac Sur{\'\i}s, Brian Chen, Susan Windisch Brown, Martha Palmer, Chris Callison-Burch, Carl Vondrick, Jiawei Han, Dan Roth, Shih-Fu Chang, Heng Ji

We present a new information extraction system that can automatically construct temporal event graphs from a collection of news documents from multiple sources, multiple languages (English and Spanish for our experiment), and multiple data modalities (speech, text, image and video).

coreference-resolution Event Extraction +1

Paper
Code

The Boombox: Visual Reconstruction from Acoustic Vibrations

1 code implementation • 17 May 2021 • Boyuan Chen, Mia Chiquier, Hod Lipson, Carl Vondrick

Due to the many ways that robots use containers, we believe the box will have a number of applications in robotics.

Paper
Code

Adversarial Attacks are Reversible with Natural Supervision

1 code implementation • ICCV 2021 • Chengzhi Mao, Mia Chiquier, Hao Wang, Junfeng Yang, Carl Vondrick

We find that images contain intrinsic structure that enables the reversal of many adversarial attacks.

Paper
Code

Analogical Reasoning for Visually Grounded Compositional Generalization

no code implementations • 1 Jan 2021 • Bo Wu, Haoyu Qin, Alireza Zareian, Carl Vondrick, Shih-Fu Chang

Children acquire language subconsciously by observing the surrounding world and listening to descriptions.

Language Acquisition

Paper
Add Code

Generative Interventions for Causal Learning

1 code implementation • CVPR 2021 • Chengzhi Mao, Augustine Cha, Amogh Gupta, Hao Wang, Junfeng Yang, Carl Vondrick

We introduce a framework for learning robust visual representations that generalize to new viewpoints, backgrounds, and scene contexts.

Ranked #44 on Image Classification on ObjectNet (using extra training data)

Image Classification Out-of-Distribution Generalization

Paper
Code

Globetrotter: Connecting Languages by Connecting Images

1 code implementation • CVPR 2022 • Dídac Surís, Dave Epstein, Carl Vondrick

Machine translation between many languages at once is highly challenging, since training with ground truth requires supervision between all language pairs, which is difficult to obtain.

Machine Translation Retrieval +2

Paper
Code

Dissecting Image Crops

1 code implementation • ICCV 2021 • Basile Van Hoorick, Carl Vondrick

The elementary operation of cropping underpins nearly every computer vision system, ranging from data augmentation and translation invariance to computational photography and representation learning.

Data Augmentation Image Cropping +4

Paper
Code

Listening to Sounds of Silence for Speech Denoising

1 code implementation • NeurIPS 2020 • Ruilin Xu, Rundi Wu, Yuko Ishiwaka, Carl Vondrick, Changxi Zheng

We introduce a deep learning model for speech denoising, a long-standing challenge in audio analysis arising in numerous applications.

Denoising Sentence +1

Paper
Code

We Have So Much In Common: Modeling Semantic Relational Set Abstractions in Videos

1 code implementation • ECCV 2020 • Alex Andonian, Camilo Fosco, Mathew Monfort, Allen Lee, Rogerio Feris, Carl Vondrick, Aude Oliva

This allows our model to perform cognitive tasks such as set abstraction (which general concept is in common among a set of videos?

Decision Making Odd One Out

Paper
Code

Analogical Reasoning for Visually Grounded Language Acquisition

no code implementations • 22 Jul 2020 • Bo Wu, Haoyu Qin, Alireza Zareian, Carl Vondrick, Shih-Fu Chang

Children acquire language subconsciously by observing the surrounding world and listening to descriptions.

Language Acquisition

Paper
Add Code

Multitask Learning Strengthens Adversarial Robustness

1 code implementation • ECCV 2020 • Chengzhi Mao, Amogh Gupta, Vikram Nitin, Baishakhi Ray, Shuran Song, Junfeng Yang, Carl Vondrick

Although deep networks achieve strong accuracy on a range of computer vision benchmarks, they remain vulnerable to adversarial attacks, where imperceptible input perturbations fool the network.

Adversarial Defense Adversarial Robustness

Paper
Code

Learning Goals from Failure

no code implementations • CVPR 2021 • Dave Epstein, Carl Vondrick

We introduce a framework that predicts the goals behind observable human action in video.

Representation Learning

Paper
Add Code

Learning to Learn Words from Visual Scenes

1 code implementation • ECCV 2020 • Dídac Surís, Dave Epstein, Heng Ji, Shih-Fu Chang, Carl Vondrick

Language acquisition is the process of learning words from the surrounding scene.

Language Acquisition Language Modelling +1

Paper
Code

Oops! Predicting Unintentional Action in Video

1 code implementation • CVPR 2020 • Dave Epstein, Boyuan Chen, Carl Vondrick

We train a supervised neural network as a baseline and analyze its performance compared to human consistency on the tasks.

Paper
Code

Visual Hide and Seek

no code implementations • 15 Oct 2019 • Boyuan Chen, Shuran Song, Hod Lipson, Carl Vondrick

We train embodied agents to play Visual Hide and Seek where a prey must navigate in a simulated environment in order to avoid capture from a predator.

Navigate

Paper
Add Code

Metric Learning for Adversarial Robustness

1 code implementation • NeurIPS 2019 • Chengzhi Mao, Ziyuan Zhong, Junfeng Yang, Carl Vondrick, Baishakhi Ray

Deep networks are well-known to be fragile to adversarial attacks.

Adversarial Robustness Metric Learning

Paper
Code

Learning from Noisy Demonstration Sets via Meta-Learned Suitability Assessor

no code implementations • ICLR 2019 • Te-Lin Wu, Jaedong Hwang, Jingyun Yang, Shaofan Lai, Carl Vondrick, Joseph J. Lim

A noisy and diverse demonstration set may hinder the performances of an agent aiming to acquire certain skills via imitation learning.

Imitation Learning Meta-Learning

Paper
Add Code

Relational Action Forecasting

no code implementations • CVPR 2019 • Chen Sun, Abhinav Shrivastava, Carl Vondrick, Rahul Sukthankar, Kevin Murphy, Cordelia Schmid

This paper focuses on multi-person action forecasting in videos.

Action Classification Action Recognition +1

Paper
Add Code

VideoBERT: A Joint Model for Video and Language Representation Learning

3 code implementations • ICCV 2019 • Chen Sun, Austin Myers, Carl Vondrick, Kevin Murphy, Cordelia Schmid

Self-supervised learning has become increasingly important to leverage the abundance of unlabeled data available on platforms like YouTube.

Ranked #1 on Action Classification on YouCook2

Action Classification General Classification +7

113

Paper
Code

Multi-level Multimodal Common Semantic Space for Image-Phrase Grounding

1 code implementation • CVPR 2019 • Hassan Akbari, Svebor Karaman, Surabhi Bhargava, Brian Chen, Carl Vondrick, Shih-Fu Chang

Following dedicated non-linear mappings for visual features at each level, word, and sentence embeddings, we obtain multiple instantiations of our common semantic space in which comparisons between any target text and the visual content is performed with cosine similarity.

Ranked #1 on Phrase Grounding on ReferIt

Language Modelling Phrase Grounding +2

Paper
Code

Actor-Centric Relation Network

1 code implementation • ECCV 2018 • Chen Sun, Abhinav Shrivastava, Carl Vondrick, Kevin Murphy, Rahul Sukthankar, Cordelia Schmid

A visualization of the learned relation features confirms that our approach is able to attend to the relevant relations for each action.

Ranked #15 on Action Recognition on AVA v2.1

Action Classification Action Detection +5

3,973

Paper
Code

Tracking Emerges by Colorizing Videos

1 code implementation • ECCV 2018 • Carl Vondrick, Abhinav Shrivastava, Alireza Fathi, Sergio Guadarrama, Kevin Murphy

We use large amounts of unlabeled video to learn models for visual tracking without manual human supervision.

Ranked #2 on Skeleton Based Action Recognition on JHMDB Pose Tracking

Colorization Optical Flow Estimation +2

Paper
Code

The Sound of Pixels

2 code implementations • ECCV 2018 • Hang Zhao, Chuang Gan, Andrew Rouditchenko, Carl Vondrick, Josh Mcdermott, Antonio Torralba

We introduce PixelPlayer, a system that, by leveraging large amounts of unlabeled videos, learns to locate image regions which produce sounds and separate the input sounds into a set of components that represents the sound from each pixel.

364

Paper
Code

Moments in Time Dataset: one million videos for event understanding

4 code implementations • 9 Jan 2018 • Mathew Monfort, Alex Andonian, Bolei Zhou, Kandan Ramakrishnan, Sarah Adel Bargal, Tom Yan, Lisa Brown, Quanfu Fan, Dan Gutfruend, Carl Vondrick, Aude Oliva

We present the Moments in Time Dataset, a large-scale human-annotated collection of one million short videos corresponding to dynamic events unfolding within three seconds.

Ranked #2 on Multimodal Activity Recognition on Moments in Time Dataset

Action Recognition Multimodal Activity Recognition +1

353

Paper
Code

Counterfactual Image Networks

no code implementations • ICLR 2018 • Deniz Oktay, Carl Vondrick, Antonio Torralba

However, when a layer is removed, the model learns to produce a different image that still looks natural to an adversary, which is possible by removing objects.

counterfactual Object +2

Paper
Add Code

Towards Practical Verification of Machine Learning: The Case of Computer Vision Systems

no code implementations • 5 Dec 2017 • Kexin Pei, Linjie Zhu, Yinzhi Cao, Junfeng Yang, Carl Vondrick, Suman Jana

Finally, we show that retraining using the safety violations detected by VeriVis can reduce the average number of violations up to 60. 2%.

BIG-bench Machine Learning Medical Diagnosis

Paper
Add Code

Following Gaze in Video

no code implementations • ICCV 2017 • Adria Recasens, Carl Vondrick, Aditya Khosla, Antonio Torralba

In this paper, we present an approach for following gaze in video by predicting where a person (in the video) is looking even when the object is in a different frame.

Paper
Add Code

Generating the Future With Adversarial Transformers

no code implementations • CVPR 2017 • Carl Vondrick, Antonio Torralba

We present a model that generates the future by transforming pixels in the past.

Video Understanding

Paper
Add Code

See, Hear, and Read: Deep Aligned Representations

1 code implementation • 3 Jun 2017 • Yusuf Aytar, Carl Vondrick, Antonio Torralba

We capitalize on large amounts of readily-available, synchronous data to learn a deep discriminative representations shared across three major natural modalities: vision, sound and language.

Cross-Modal Retrieval Representation Learning +1

Paper
Code

AVA: A Video Dataset of Spatio-temporally Localized Atomic Visual Actions

8 code implementations • CVPR 2018 • Chunhui Gu, Chen Sun, David A. Ross, Carl Vondrick, Caroline Pantofaru, Yeqing Li, Sudheendra Vijayanarasimhan, George Toderici, Susanna Ricco, Rahul Sukthankar, Cordelia Schmid, Jitendra Malik

The AVA dataset densely annotates 80 atomic visual actions in 430 15-minute video clips, where actions are localized in space and time, resulting in 1. 58M action labels with multiple labels per person occurring frequently.

Ranked #6 on Action Detection on UCF101-24

Actin Detection Action Detection +3

76,672

Paper
Code

Following Gaze Across Views

no code implementations • 9 Dec 2016 • Adrià Recasens, Carl Vondrick, Aditya Khosla, Antonio Torralba

In this paper, we present an approach for following gaze across views by predicting where a particular person is looking throughout a scene.

Paper
Add Code

Who is Mistaken?

no code implementations • 4 Dec 2016 • Benjamin Eysenbach, Carl Vondrick, Antonio Torralba

We then create a representation of characters' beliefs for two tasks in human action understanding: predicting who is mistaken, and when they are mistaken.

Action Understanding

Paper
Add Code

Cross-Modal Scene Networks

no code implementations • 27 Oct 2016 • Yusuf Aytar, Lluis Castrejon, Carl Vondrick, Hamed Pirsiavash, Antonio Torralba

Our experiments suggest that our scene representation can help transfer representations across modalities for retrieval.

Retrieval

Paper
Add Code

SoundNet: Learning Sound Representations from Unlabeled Video

6 code implementations • NeurIPS 2016 • Yusuf Aytar, Carl Vondrick, Antonio Torralba

We learn rich natural sound representations by capitalizing on large amounts of unlabeled sound data collected in the wild.

General Classification

459

Paper
Code

Generating Videos with Scene Dynamics

no code implementations • NeurIPS 2016 • Carl Vondrick, Hamed Pirsiavash, Antonio Torralba

We capitalize on large amounts of unlabeled video in order to learn a model of scene dynamics for both video recognition tasks (e. g. action classification) and video generation tasks (e. g. future prediction).

Ranked #7 on Video Generation on UCF-101 16 frames, Unconditional, Single GPU

Action Classification Future prediction +7

Paper
Add Code

Learning Aligned Cross-Modal Representations from Weakly Aligned Data

no code implementations • CVPR 2016 • Lluis Castrejon, Yusuf Aytar, Carl Vondrick, Hamed Pirsiavash, Antonio Torralba

Our experiments suggest that our scene representation can help transfer representations across modalities for retrieval.

Retrieval

Paper
Add Code

Where are they looking?

no code implementations • NeurIPS 2015 • Adria Recasens, Aditya Khosla, Carl Vondrick, Antonio Torralba

Humans have the remarkable ability to follow the gaze of other people to identify what they are looking at.

Paper
Add Code

Anticipating Visual Representations from Unlabeled Video

no code implementations • CVPR 2016 • Carl Vondrick, Hamed Pirsiavash, Antonio Torralba

The key idea behind our approach is that we can train deep networks to predict the visual representation of images in the future.

Paper
Add Code

Do We Need More Training Data?

no code implementations • 5 Mar 2015 • Xiangxin Zhu, Carl Vondrick, Charless Fowlkes, Deva Ramanan

Datasets for training object recognition systems are steadily increasing in size.

Object Recognition

Paper
Add Code

Visualizing Object Detection Features

1 code implementation • 19 Feb 2015 • Carl Vondrick, Aditya Khosla, Hamed Pirsiavash, Tomasz Malisiewicz, Antonio Torralba

We introduce algorithms to visualize feature spaces used by object detectors.

Object object-detection +1

Paper
Code

Learning visual biases from human imagination

no code implementations • NeurIPS 2015 • Carl Vondrick, Hamed Pirsiavash, Aude Oliva, Antonio Torralba

Although the human visual system can recognize many concepts under challenging conditions, it still has some biases.

Object Recognition

Paper
Add Code

Predicting Motivations of Actions by Leveraging Text

no code implementations • CVPR 2016 • Carl Vondrick, Deniz Oktay, Hamed Pirsiavash, Antonio Torralba

In this paper, we introduce the problem of predicting why a person has performed an action in images.

Paper
Add Code

Inverting and Visualizing Features for Object Detection

no code implementations • 11 Dec 2012 • Carl Vondrick, Aditya Khosla, Tomasz Malisiewicz, Antonio Torralba

By visualizing feature spaces, we can gain a more intuitive understanding of our detection systems.

Object object-detection +1

Paper
Add Code

Video Annotation and Tracking with Active Learning

no code implementations • NeurIPS 2011 • Carl Vondrick, Deva Ramanan

We introduce a novel active learning framework for video annotation.

Active Learning

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.