no code implementations • 22 Apr 2024 • Amirmojtaba Sabour, Sanja Fidler, Karsten Kreis
Diffusion models (DMs) have established themselves as the state-of-the-art generative modeling approach in the visual domain and beyond.
no code implementations • 21 Dec 2023 • Huan Ling, Seung Wook Kim, Antonio Torralba, Sanja Fidler, Karsten Kreis
We also propose a motion amplification mechanism as well as a new autoregressive synthesis scheme to generate and combine multiple 4D sequences for longer generation.
no code implementations • 28 Nov 2023 • Yufeng Zheng, Xueting Li, Koki Nagano, Sifei Liu, Karsten Kreis, Otmar Hilliges, Shalini De Mello
Large-scale diffusion generative models are greatly simplifying image, video and 3D asset creation from user-provided text prompts and images.
no code implementations • 22 Nov 2023 • Katja Schwarz, Seung Wook Kim, Jun Gao, Sanja Fidler, Andreas Geiger, Karsten Kreis
Then, we train a diffusion model in the 3D-aware latent space, thereby enabling synthesis of high-quality 3D-consistent image samples, outperforming recent state-of-the-art GAN-based methods.
no code implementations • ICCV 2023 • Tianshi Cao, Karsten Kreis, Sanja Fidler, Nicholas Sharp, Kangxue Yin
We present TexFusion (Texture Diffusion), a new method to synthesize textures for given 3D geometries, using large-scale text-guided image diffusion models.
no code implementations • ICCV 2023 • Daiqing Li, Huan Ling, Amlan Kar, David Acuna, Seung Wook Kim, Karsten Kreis, Antonio Torralba, Sanja Fidler
In this work, we introduce a self-supervised feature representation learning framework DreamTeacher that utilizes generative networks for pre-training downstream image backbones.
no code implementations • CVPR 2023 • Seung Wook Kim, Bradley Brown, Kangxue Yin, Karsten Kreis, Katja Schwarz, Daiqing Li, Robin Rombach, Antonio Torralba, Sanja Fidler
We first train a scene auto-encoder to express a set of image and pose pairs as a neural field, represented as density and feature voxel grids that can be projected to produce novel views of the scene.
2 code implementations • CVPR 2023 • Andreas Blattmann, Robin Rombach, Huan Ling, Tim Dockhorn, Seung Wook Kim, Sanja Fidler, Karsten Kreis
We first pre-train an LDM on images only; then, we turn the image generator into a video generator by introducing a temporal dimension to the latent space diffusion model and fine-tuning on encoded image sequences, i. e., videos.
Ranked #10 on Text-to-Video Generation on UCF-101
no code implementations • CVPR 2023 • Davis Rempe, Zhengyi Luo, Xue Bin Peng, Ye Yuan, Kris Kitani, Karsten Kreis, Sanja Fidler, Or Litany
We introduce a method for generating realistic pedestrian trajectories and full-body animations that can be controlled to meet user-defined goals.
no code implementations • 14 Feb 2023 • Jae Hyun Lim, Nikola B. Kovachki, Ricardo Baptista, Christopher Beckham, Kamyar Azizzadenesheli, Jean Kossaifi, Vikram Voleti, Jiaming Song, Karsten Kreis, Jan Kautz, Christopher Pal, Arash Vahdat, Anima Anandkumar
They consist of a forward process that perturbs input data with Gaussian white noise and a reverse process that learns a score function to generate samples by denoising.
no code implementations • 25 Nov 2022 • Karsten Kreis, Tim Dockhorn, Zihao Li, Ellen Zhong
The state-of-the-art method cryoDRGN uses a Variational Autoencoder (VAE) framework to learn a continuous distribution of protein structures from single particle cryo-EM imaging data.
1 code implementation • CVPR 2023 • Chen-Hsuan Lin, Jun Gao, Luming Tang, Towaki Takikawa, Xiaohui Zeng, Xun Huang, Karsten Kreis, Sanja Fidler, Ming-Yu Liu, Tsung-Yi Lin
DreamFusion has recently demonstrated the utility of a pre-trained text-to-image diffusion model to optimize Neural Radiance Fields (NeRF), achieving remarkable text-to-3D synthesis results.
Ranked #2 on Text to 3D on T$^3$Bench
2 code implementations • 2 Nov 2022 • Yogesh Balaji, Seungjun Nah, Xun Huang, Arash Vahdat, Jiaming Song, Qinsheng Zhang, Karsten Kreis, Miika Aittala, Timo Aila, Samuli Laine, Bryan Catanzaro, Tero Karras, Ming-Yu Liu
Therefore, in contrast to existing works, we propose to train an ensemble of text-to-image diffusion models specialized for different synthesis stages.
Ranked #14 on Text-to-Image Generation on MS COCO
1 code implementation • 18 Oct 2022 • Tim Dockhorn, Tianshi Cao, Arash Vahdat, Karsten Kreis
While modern machine learning models rely on increasingly large training datasets, data is often limited in privacy-sensitive domains.
2 code implementations • 12 Oct 2022 • Xiaohui Zeng, Arash Vahdat, Francis Williams, Zan Gojcic, Or Litany, Sanja Fidler, Karsten Kreis
To advance 3D DDMs and make them useful for digital artists, we require (i) high generation quality, (ii) flexibility for manipulation and applications such as conditional synthesis and shape interpolation, and (iii) the ability to output smooth surfaces or meshes.
Ranked #1 on Point Cloud Generation on ShapeNet Airplane
1 code implementation • 11 Oct 2022 • Tim Dockhorn, Arash Vahdat, Karsten Kreis
Synthesis amounts to solving a differential equation (DE) defined by the learnt model.
Ranked #5 on Image Generation on AFHQV2
no code implementations • CVPR 2022 • Seung Wook Kim, Karsten Kreis, Daiqing Li, Antonio Torralba, Sanja Fidler
Modern image generative models show remarkable sample quality when trained on a single domain or class of objects.
Generative Adversarial Network Image-to-Image Translation +1
no code implementations • 8 Feb 2022 • Cinjon Resnick, Or Litany, Amlan Kar, Karsten Kreis, James Lucas, Kyunghyun Cho, Sanja Fidler
Our main contribution is a pseudo-automatic method to discover such groups in foresight by performing causal interventions on simulated scenes.
no code implementations • CVPR 2022 • Daiqing Li, Huan Ling, Seung Wook Kim, Karsten Kreis, Adela Barriuso, Sanja Fidler, Antonio Torralba
By training an effective feature segmentation architecture on top of BigGAN, we turn BigGAN into a labeled dataset generator.
5 code implementations • ICLR 2022 • Zhisheng Xiao, Karsten Kreis, Arash Vahdat
To the best of our knowledge, denoising diffusion GAN is the first model that reduces sampling cost in diffusion models to an extent that allows them to be applied to real-world applications inexpensively.
Ranked #9 on Image Generation on CelebA-HQ 256x256
1 code implementation • ICLR 2022 • Tim Dockhorn, Arash Vahdat, Karsten Kreis
SGMs rely on a diffusion process that gradually perturbs the data towards a tractable distribution, while the generative model learns to denoise.
Ranked #24 on Image Generation on CIFAR-10
no code implementations • NeurIPS 2021 • Tianshi Cao, Alex Bie, Arash Vahdat, Sanja Fidler, Karsten Kreis
Generative models trained with privacy constraints on private data can sidestep this challenge, providing indirect access to private data instead.
1 code implementation • NeurIPS 2021 • Huan Ling, Karsten Kreis, Daiqing Li, Seung Wook Kim, Antonio Torralba, Sanja Fidler
EditGAN builds on a GAN framework that jointly models images and their semantic segmentations, requiring only a handful of labeled examples, making it a scalable tool for editing.
1 code implementation • 1 Nov 2021 • Tianshi Cao, Alex Bie, Arash Vahdat, Sanja Fidler, Karsten Kreis
Generative models trained with privacy constraints on private data can sidestep this challenge, providing indirect access to private data instead.
1 code implementation • NeurIPS 2021 • Despoina Paschalidou, Amlan Kar, Maria Shugrina, Karsten Kreis, Andreas Geiger, Sanja Fidler
The ability to synthesize realistic and diverse indoor furniture layouts automatically or based on partial input, unlocks many applications, from better interactive 3D tools to data synthesis for training and simulation.
Ranked #3 on Indoor Scene Synthesis on PRO-teXt
2D Semantic Segmentation task 1 (8 classes) 3D Semantic Scene Completion +1
no code implementations • 29 Sep 2021 • Cinjon Resnick, Or Litany, Amlan Kar, Karsten Kreis, James Lucas, Kyunghyun Cho, Sanja Fidler
We verify that the prioritized groups found via intervention are challenging for the object detector and show that retraining with data collected from these groups helps inordinately compared to adding more IID data.
1 code implementation • NeurIPS 2021 • Arash Vahdat, Karsten Kreis, Jan Kautz
Moving from data to latent space allows us to train more expressive generative models, apply SGMs to non-continuous data, and learn smoother SGMs in a smaller space, resulting in fewer network evaluations and faster sampling.
Ranked #3 on Image Generation on CIFAR-10 (FD metric)
no code implementations • CVPR 2021 • Daiqing Li, Junlin Yang, Karsten Kreis, Antonio Torralba, Sanja Fidler
Training deep networks with limited labeled data while achieving a strong generalization ability is key in the quest to reduce human annotation efforts.
2 code implementations • CVPR 2021 • Towaki Takikawa, Joey Litalien, Kangxue Yin, Karsten Kreis, Charles Loop, Derek Nowrouzezahrai, Alec Jacobson, Morgan McGuire, Sanja Fidler
We introduce an efficient neural representation that, for the first time, enables real-time rendering of high-fidelity neural SDFs, while achieving state-of-the-art geometry reconstruction quality.
no code implementations • 1 Jan 2021 • Tianshi Cao, Alex Bie, Karsten Kreis, Sanja Fidler
Generative models trained with privacy constraints on private data can sidestep this challenge and provide indirect access to the private data instead.
no code implementations • NeurIPS 2020 • Huan Ling, David Acuna, Karsten Kreis, Seung Wook Kim, Sanja Fidler
In images of complex scenes, objects are often occluding each other which makes perception tasks such as object detection and tracking, or robotic control tasks such as planning, challenging.
1 code implementation • ICLR 2021 • Zhisheng Xiao, Karsten Kreis, Jan Kautz, Arash Vahdat
VAEBM captures the overall mode structure of the data distribution using a state-of-the-art VAE and it relies on its EBM component to explicitly exclude non-data-like regions from the model and refine the image samples.
Ranked #1 on Image Generation on Stacked MNIST