Search Results for author: Sergey Tulyakov

Found 75 papers, 33 papers with code

MoA: Mixture-of-Attention for Subject-Context Disentanglement in Personalized Image Generation

no code implementations • 17 Apr 2024 • Kuan-Chieh Wang, Daniil Ostashev, Yuwei Fang, Sergey Tulyakov, Kfir Aberman

MoA is designed to retain the original model's prior by fixing its attention layers in the prior branch, while minimally intervening in the generation process with the personalized branch that learns to embed subjects in the layout and context generated by the prior branch.

Disentanglement Image Generation

Paper
Add Code

TextCraftor: Your Text Encoder Can be Image Quality Controller

no code implementations • 27 Mar 2024 • Yanyu Li, Xian Liu, Anil Kag, Ju Hu, Yerlan Idelbayev, Dhritiman Sagar, Yanzhi Wang, Sergey Tulyakov, Jian Ren

Our findings reveal that, instead of replacing the CLIP text encoder used in Stable Diffusion with other large language models, we can enhance it through our proposed fine-tuning approach, TextCraftor, leading to substantial improvements in quantitative benchmarks and human assessments.

Image Generation

Paper
Add Code

TC4D: Trajectory-Conditioned Text-to-4D Generation

no code implementations • 26 Mar 2024 • Sherwin Bahmani, Xian Liu, Yifan Wang, Ivan Skorokhodov, Victor Rong, Ziwei Liu, Xihui Liu, Jeong Joon Park, Sergey Tulyakov, Gordon Wetzstein, Andrea Tagliasacchi, David B. Lindell

We learn local deformations that conform to the global trajectory using supervision from a text-to-video model.

Scene Generation Video Generation

Paper
Add Code

MyVLM: Personalizing VLMs for User-Specific Queries

no code implementations • 21 Mar 2024 • Yuval Alaluf, Elad Richardson, Sergey Tulyakov, Kfir Aberman, Daniel Cohen-Or

To effectively recognize a variety of user-specific concepts, we augment the VLM with external concept heads that function as toggles for the model, enabling the VLM to identify the presence of specific target concepts in a given image.

Image Captioning Language Modelling +2

Paper
Add Code

Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers

no code implementations • 29 Feb 2024 • Tsai-Shien Chen, Aliaksandr Siarohin, Willi Menapace, Ekaterina Deyneka, Hsiang-wei Chao, Byung Eun Jeon, Yuwei Fang, Hsin-Ying Lee, Jian Ren, Ming-Hsuan Yang, Sergey Tulyakov

Next, we finetune a retrieval model on a small subset where the best caption of each video is manually selected and then employ the model in the whole dataset to select the best caption as the annotation.

Retrieval Text Retrieval +3

Paper
Add Code

Evaluating Very Long-Term Conversational Memory of LLM Agents

no code implementations • 27 Feb 2024 • Adyasha Maharana, Dong-Ho Lee, Sergey Tulyakov, Mohit Bansal, Francesco Barbieri, Yuwei Fang

Using this pipeline, we collect LoCoMo, a dataset of very long-term conversations, each encompassing 300 turns and 9K tokens on avg., over up to 35 sessions.

Avg Multi-modal Dialogue Generation +1

Paper
Add Code

Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis

no code implementations • 22 Feb 2024 • Willi Menapace, Aliaksandr Siarohin, Ivan Skorokhodov, Ekaterina Deyneka, Tsai-Shien Chen, Anil Kag, Yuwei Fang, Aleksei Stoliar, Elisa Ricci, Jian Ren, Sergey Tulyakov

Since video content is highly redundant, we argue that naively bringing advances of image models to the video generation domain reduces motion fidelity, visual quality and impairs scalability.

Ranked #1 on Text-to-Video Generation on MSR-VTT

Image Generation Text-to-Video Generation +1

Paper
Add Code

Visual Concept-driven Image Generation with Text-to-Image Diffusion Model

no code implementations • 18 Feb 2024 • Tanzila Rahman, Shweta Mahajan, Hsin-Ying Lee, Jian Ren, Sergey Tulyakov, Leonid Sigal

We illustrate that such joint alternating refinement leads to the learning of better tokens for concepts and, as a bi-product, latent masks.

Image Generation

Paper
Add Code

SPAD : Spatially Aware Multiview Diffusers

no code implementations • 7 Feb 2024 • Yash Kant, Ziyi Wu, Michael Vasilkovsky, Guocheng Qian, Jian Ren, Riza Alp Guler, Bernard Ghanem, Sergey Tulyakov, Igor Gilitschenski, Aliaksandr Siarohin

We present SPAD, a novel approach for creating consistent multi-view images from text prompts or single images.

3D Generation Novel View Synthesis +1

Paper
Add Code

AToM: Amortized Text-to-Mesh using 2D Diffusion

no code implementations • 1 Feb 2024 • Guocheng Qian, Junli Cao, Aliaksandr Siarohin, Yash Kant, Chaoyang Wang, Michael Vasilkovsky, Hsin-Ying Lee, Yuwei Fang, Ivan Skorokhodov, Peiye Zhuang, Igor Gilitschenski, Jian Ren, Bernard Ghanem, Kfir Aberman, Sergey Tulyakov

We introduce Amortized Text-to-Mesh (AToM), a feed-forward text-to-mesh framework optimized across multiple text prompts simultaneously.

Text to 3D

Paper
Add Code

E$^{2}$GAN: Efficient Training of Efficient GANs for Image-to-Image Translation

no code implementations • 11 Jan 2024 • Yifan Gong, Zheng Zhan, Qing Jin, Yanyu Li, Yerlan Idelbayev, Xian Liu, Andrey Zharkov, Kfir Aberman, Sergey Tulyakov, Yanzhi Wang, Jian Ren

One highly promising direction for enabling flexible real-time on-device image editing is utilizing data distillation by leveraging large-scale text-to-image diffusion models, such as Stable Diffusion, to generate paired datasets used for training generative adversarial networks (GANs).

Image-to-Image Translation

Paper
Add Code

Diffusion Priors for Dynamic View Synthesis from Monocular Videos

no code implementations • 10 Jan 2024 • Chaoyang Wang, Peiye Zhuang, Aliaksandr Siarohin, Junli Cao, Guocheng Qian, Hsin-Ying Lee, Sergey Tulyakov

Dynamic novel view synthesis aims to capture the temporal evolution of visual content within videos.

Novel View Synthesis

Paper
Add Code

Virtual Pets: Animatable Animal Generation in 3D Scenes

no code implementations • 21 Dec 2023 • Yen-Chi Cheng, Chieh Hubert Lin, Chaoyang Wang, Yash Kant, Sergey Tulyakov, Alexander Schwing, LiangYan Gui, Hsin-Ying Lee

Toward unlocking the potential of generative models in immersive 4D experiences, we introduce Virtual Pet, a novel pipeline to model realistic and diverse motions for target animal species within a 3D environment.

Paper
Add Code

SceneWiz3D: Towards Text-guided 3D Scene Composition

no code implementations • 13 Dec 2023 • Qihang Zhang, Chaoyang Wang, Aliaksandr Siarohin, Peiye Zhuang, Yinghao Xu, Ceyuan Yang, Dahua Lin, Bolei Zhou, Sergey Tulyakov, Hsin-Ying Lee

We are witnessing significant breakthroughs in the technology for generating 3D objects from text.

Text to 3D

Paper
Add Code

UpFusion: Novel View Diffusion from Unposed Sparse View Observations

no code implementations • 11 Dec 2023 • Bharath Raj Nagoor Kani, Hsin-Ying Lee, Sergey Tulyakov, Shubham Tulsiani

We propose UpFusion, a system that can perform novel view synthesis and infer 3D representations for an object given a sparse set of reference images without corresponding pose information.

Novel View Synthesis

Paper
Add Code

4D-fy: Text-to-4D Generation Using Hybrid Score Distillation Sampling

no code implementations • 29 Nov 2023 • Sherwin Bahmani, Ivan Skorokhodov, Victor Rong, Gordon Wetzstein, Leonidas Guibas, Peter Wonka, Sergey Tulyakov, Jeong Joon Park, Andrea Tagliasacchi, David B. Lindell

Recent breakthroughs in text-to-4D generation rely on pre-trained text-to-image and text-to-video models to generate dynamic 3D scenes.

Paper
Add Code

SceneTex: High-Quality Texture Synthesis for Indoor Scenes via Diffusion Priors

no code implementations • 28 Nov 2023 • Dave Zhenyu Chen, Haoxuan Li, Hsin-Ying Lee, Sergey Tulyakov, Matthias Nießner

We propose SceneTex, a novel method for effectively generating high-quality and style-consistent textures for indoor scenes using depth-to-image diffusion priors.

Decoder Texture Synthesis

Paper
Add Code

iNVS: Repurposing Diffusion Inpainters for Novel View Synthesis

no code implementations • 24 Oct 2023 • Yash Kant, Aliaksandr Siarohin, Michael Vasilkovsky, Riza Alp Guler, Jian Ren, Sergey Tulyakov, Igor Gilitschenski

Our approach focuses on maximizing the reuse of visible pixels from the source image.

Novel View Synthesis

Paper
Add Code

HyperHuman: Hyper-Realistic Human Generation with Latent Structural Diffusion

no code implementations • 12 Oct 2023 • Xian Liu, Jian Ren, Aliaksandr Siarohin, Ivan Skorokhodov, Yanyu Li, Dahua Lin, Xihui Liu, Ziwei Liu, Sergey Tulyakov

Our model enforces the joint learning of image appearance, spatial relationship, and geometry in a unified network, where each branch in the model complements to each other with both structural awareness and textural richness.

Image Generation

Paper
Add Code

AutoDecoding Latent 3D Diffusion Models

1 code implementation • NeurIPS 2023 • Evangelos Ntavelis, Aliaksandr Siarohin, Kyle Olszewski, Chaoyang Wang, Luc van Gool, Sergey Tulyakov

We present a novel approach to the generation of static and articulated 3D assets that has a 3D autodecoder at its core.

131

Paper
Code

Text-Guided Synthesis of Eulerian Cinemagraphs

1 code implementation • 6 Jul 2023 • Aniruddha Mahapatra, Aliaksandr Siarohin, Hsin-Ying Lee, Sergey Tulyakov, Jun-Yan Zhu

We introduce Text2Cinemagraph, a fully automated method for creating cinemagraphs from text descriptions - an especially challenging task when prompts feature imaginary elements and artistic styles, given the complexity of interpreting the semantics and motions of these images.

Image Animation

346

Paper
Code

Magic123: One Image to High-Quality 3D Object Generation Using Both 2D and 3D Diffusion Priors

1 code implementation • 30 Jun 2023 • Guocheng Qian, Jinjie Mai, Abdullah Hamdi, Jian Ren, Aliaksandr Siarohin, Bing Li, Hsin-Ying Lee, Ivan Skorokhodov, Peter Wonka, Sergey Tulyakov, Bernard Ghanem

We present Magic123, a two-stage coarse-to-fine approach for high-quality, textured 3D meshes generation from a single unposed image in the wild using both2D and 3D priors.

Image to 3D

1,469

Paper
Code

SnapFusion: Text-to-Image Diffusion Model on Mobile Devices within Two Seconds

no code implementations • NeurIPS 2023 • Yanyu Li, Huan Wang, Qing Jin, Ju Hu, Pavlo Chemerys, Yun Fu, Yanzhi Wang, Sergey Tulyakov, Jian Ren

We achieve so by introducing efficient network architecture and improving step distillation.

Decoder Denoising

Paper
Add Code

Promptable Game Models: Text-Guided Game Simulation via Masked Diffusion Models

no code implementations • 23 Mar 2023 • Willi Menapace, Aliaksandr Siarohin, Stéphane Lathuilière, Panos Achlioptas, Vladislav Golyanik, Sergey Tulyakov, Elisa Ricci

Most captivatingly, our PGM unlocks the director's mode, where the game is played by specifying goals for the agents in the form of a prompt.

Navigate

Paper
Add Code

Text2Tex: Text-driven Texture Synthesis via Diffusion Models

no code implementations • ICCV 2023 • Dave Zhenyu Chen, Yawar Siddiqui, Hsin-Ying Lee, Sergey Tulyakov, Matthias Nießner

We present Text2Tex, a novel method for generating high-quality textures for 3D meshes from the given text prompts.

Texture Synthesis

Paper
Add Code

3D generation on ImageNet

no code implementations • 2 Mar 2023 • Ivan Skorokhodov, Aliaksandr Siarohin, Yinghao Xu, Jian Ren, Hsin-Ying Lee, Peter Wonka, Sergey Tulyakov

Existing 3D-from-2D generators are typically designed for well-curated single-category datasets, where all the objects have (approximately) the same scale, 3D location, and orientation, and the camera always points to the center of the scene.

3D Generation

Paper
Add Code

Invertible Neural Skinning

no code implementations • CVPR 2023 • Yash Kant, Aliaksandr Siarohin, Riza Alp Guler, Menglei Chai, Jian Ren, Sergey Tulyakov, Igor Gilitschenski

Next, we combine PIN with a differentiable LBS module to build an expressive and end-to-end Invertible Neural Skinning (INS) pipeline.

Paper
Add Code

Unsupervised Volumetric Animation

no code implementations • CVPR 2023 • Aliaksandr Siarohin, Willi Menapace, Ivan Skorokhodov, Kyle Olszewski, Jian Ren, Hsin-Ying Lee, Menglei Chai, Sergey Tulyakov

We propose a novel approach for unsupervised 3D animation of non-rigid deformable objects.

Keypoint Estimation Novel View Synthesis

Paper
Add Code

InfiniCity: Infinite-Scale City Synthesis

no code implementations • ICCV 2023 • Chieh Hubert Lin, Hsin-Ying Lee, Willi Menapace, Menglei Chai, Aliaksandr Siarohin, Ming-Hsuan Yang, Sergey Tulyakov

Toward infinite-scale 3D city synthesis, we propose a novel framework, InfiniCity, which constructs and renders an unconstrainedly large and 3D-grounded environment from random noises.

Image Generation Neural Rendering

Paper
Add Code

3DAvatarGAN: Bridging Domains for Personalized Editable Avatars

no code implementations • CVPR 2023 • Rameen Abdal, Hsin-Ying Lee, Peihao Zhu, Menglei Chai, Aliaksandr Siarohin, Peter Wonka, Sergey Tulyakov

Finally, we propose a novel inversion method for 3D-GANs linking the latent spaces of the source and the target domains.

Paper
Add Code

ShapeTalk: A Language Dataset and Framework for 3D Shape Edits and Deformations

1 code implementation • CVPR 2023 • Panos Achlioptas, IAn Huang, Minhyuk Sung, Sergey Tulyakov, Leonidas Guibas

In this work, we aim to facilitate the task of editing the geometry of 3D models through the use of natural language.

Neural Rendering

Paper
Code

DisCoScene: Spatially Disentangled Generative Radiance Fields for Controllable 3D-aware Scene Synthesis

no code implementations • CVPR 2023 • Yinghao Xu, Menglei Chai, Zifan Shi, Sida Peng, Ivan Skorokhodov, Aliaksandr Siarohin, Ceyuan Yang, Yujun Shen, Hsin-Ying Lee, Bolei Zhou, Sergey Tulyakov

Existing 3D-aware image synthesis approaches mainly focus on generating a single canonical object and show limited capacity in composing a complex scene containing a variety of objects.

3D-Aware Image Synthesis Object

Paper
Add Code

Rethinking Vision Transformers for MobileNet Size and Speed

6 code implementations • ICCV 2023 • Yanyu Li, Ju Hu, Yang Wen, Georgios Evangelidis, Kamyar Salahi, Yanzhi Wang, Sergey Tulyakov, Jian Ren

With the success of Vision Transformers (ViTs) in computer vision tasks, recent arts try to optimize the performance and complexity of ViTs to enable efficient deployment on mobile devices.

30,048

Paper
Code

Real-Time Neural Light Field on Mobile Devices

1 code implementation • CVPR 2023 • Junli Cao, Huan Wang, Pavlo Chemerys, Vladislav Shakhrai, Ju Hu, Yun Fu, Denys Makoviichuk, Sergey Tulyakov, Jian Ren

Nevertheless, to reach a similar rendering quality as NeRF, the network in NeLF is designed with intensive computation, which is not mobile-friendly.

Neural Rendering Novel View Synthesis

186

Paper
Code

LADIS: Language Disentanglement for 3D Shape Editing

1 code implementation • 9 Dec 2022 • IAn Huang, Panos Achlioptas, Tianyi Zhang, Sergey Tulyakov, Minhyuk Sung, Leonidas Guibas

Additionally, to measure edit locality, we define a new metric that we call part-wise edit precision.

Disentanglement

Paper
Code

SDFusion: Multimodal 3D Shape Completion, Reconstruction, and Generation

1 code implementation • CVPR 2023 • Yen-Chi Cheng, Hsin-Ying Lee, Sergey Tulyakov, Alexander Schwing, LiangYan Gui

To enable interactive generation, our method supports a variety of input modalities that can be easily provided by a human, including images, text, partially observed shapes and combinations of these, further allowing to adjust the strength of each input.

3D Reconstruction 3D Shape Generation +3

370

Paper
Code

Make-A-Story: Visual Memory Conditioned Consistent Story Generation

1 code implementation • CVPR 2023 • Tanzila Rahman, Hsin-Ying Lee, Jian Ren, Sergey Tulyakov, Shweta Mahajan, Leonid Sigal

Our experiments for story generation on the MUGEN, the PororoSV and the FlintstonesSV dataset show that our method not only outperforms prior state-of-the-art in generating frames with high visual quality, which are consistent with the story, but also models appropriate correspondences between the characters and the background.

Sentence Story Generation +1

Paper
Code

Affection: Learning Affective Explanations for Real-World Visual Data

no code implementations • CVPR 2023 • Panos Achlioptas, Maks Ovsjanikov, Leonidas Guibas, Sergey Tulyakov

To embark on this journey, we introduce and share with the research community a large-scale dataset that contains emotional reactions and free-form textual explanations for 85, 007 publicly available images, analyzed by 6, 283 annotators who were asked to indicate and explain how and why they felt in a particular way when observing a specific image, producing a total of 526, 749 responses.

Paper
Add Code

Layer Freezing & Data Sieving: Missing Pieces of a Generic Framework for Sparse Training

1 code implementation • 22 Sep 2022 • Geng Yuan, Yanyu Li, Sheng Li, Zhenglun Kong, Sergey Tulyakov, Xulong Tang, Yanzhi Wang, Jian Ren

Therefore, we analyze the feasibility and potentiality of using the layer freezing technique in sparse training and find it has the potential to save considerable training costs.

Paper
Code

Cross-Modal 3D Shape Generation and Manipulation

no code implementations • 24 Jul 2022 • Zezhou Cheng, Menglei Chai, Jian Ren, Hsin-Ying Lee, Kyle Olszewski, Zeng Huang, Subhransu Maji, Sergey Tulyakov

In this paper, we propose a generic multi-modal generative model that couples the 2D modalities and implicit 3D representations through shared latent spaces.

3D Generation 3D Shape Generation

Paper
Add Code

EpiGRAF: Rethinking training of 3D GANs

1 code implementation • 21 Jun 2022 • Ivan Skorokhodov, Sergey Tulyakov, Yiqun Wang, Peter Wonka

In this work, we show that it is possible to obtain a high-resolution 3D generator with SotA image quality by following a completely different route of simply training the model patch-wise.

3D-Aware Image Synthesis

150

Paper
Code

Discrete Contrastive Diffusion for Cross-Modal Music and Image Generation

1 code implementation • 15 Jun 2022 • Ye Zhu, Yu Wu, Kyle Olszewski, Jian Ren, Sergey Tulyakov, Yan Yan

Diffusion probabilistic models (DPMs) have become a popular approach to conditional generation, due to their promising results and support for cross-modal synthesis.

Contrastive Learning Denoising +2

152

Paper
Code

EfficientFormer: Vision Transformers at MobileNet Speed

10 code implementations • 2 Jun 2022 • Yanyu Li, Geng Yuan, Yang Wen, Ju Hu, Georgios Evangelidis, Sergey Tulyakov, Yanzhi Wang, Jian Ren

Our work proves that properly designed transformers can reach extremely low latency on mobile devices while maintaining high performance.

30,048

Paper
Code

Control-NeRF: Editable Feature Volumes for Scene Rendering and Manipulation

no code implementations • 22 Apr 2022 • Verica Lazova, Vladimir Guzov, Kyle Olszewski, Sergey Tulyakov, Gerard Pons-Moll

With the aim of obtaining interpretable and controllable scene representations, our model couples learnt scene-specific feature volumes with a scene agnostic neural rendering network.

Neural Rendering Novel View Synthesis

Paper
Add Code

Quantized GAN for Complex Music Generation from Dance Videos

1 code implementation • 1 Apr 2022 • Ye Zhu, Kyle Olszewski, Yu Wu, Panos Achlioptas, Menglei Chai, Yan Yan, Sergey Tulyakov

We present Dance2Music-GAN (D2M-GAN), a novel adversarial multi-modal framework that generates complex musical samples conditioned on dance videos.

Music Generation

Paper
Code

R2L: Distilling Neural Radiance Field to Neural Light Field for Efficient Novel View Synthesis

1 code implementation • 31 Mar 2022 • Huan Wang, Jian Ren, Zeng Huang, Kyle Olszewski, Menglei Chai, Yun Fu, Sergey Tulyakov

On the other hand, Neural Light Field (NeLF) presents a more straightforward representation over NeRF in novel view synthesis -- the rendering of a pixel amounts to one single forward pass without ray-marching.

Novel View Synthesis

183

Paper
Code

Show Me What and Tell Me How: Video Synthesis via Multimodal Conditioning

1 code implementation • CVPR 2022 • Ligong Han, Jian Ren, Hsin-Ying Lee, Francesco Barbieri, Kyle Olszewski, Shervin Minaee, Dimitris Metaxas, Sergey Tulyakov

In addition, our model can extract visual information as suggested by the text prompt, e. g., "an object in image one is moving northeast", and generate corresponding videos.

Self-Learning Text Augmentation +1

187

Paper
Code

Playable Environments: Video Manipulation in Space and Time

1 code implementation • CVPR 2022 • Willi Menapace, Stéphane Lathuilière, Aliaksandr Siarohin, Christian Theobalt, Sergey Tulyakov, Vladislav Golyanik, Elisa Ricci

We present Playable Environments - a new representation for interactive video generation and manipulation in space and time.

Video Generation

Paper
Code

F8Net: Fixed-Point 8-bit Only Multiplication for Network Quantization

1 code implementation • ICLR 2022 • Qing Jin, Jian Ren, Richard Zhuang, Sumant Hanumante, Zhengang Li, Zhiyu Chen, Yanzhi Wang, Kaiyuan Yang, Sergey Tulyakov

Our approach achieves comparable and better performance, when compared not only to existing quantization techniques with INT32 multiplication or floating-point arithmetic, but also to the full-precision counterparts, achieving state-of-the-art performance.

Quantization

Paper
Code

NeROIC: Neural Rendering of Objects from Online Image Collections

1 code implementation • 7 Jan 2022 • Zhengfei Kuang, Kyle Olszewski, Menglei Chai, Zeng Huang, Panos Achlioptas, Sergey Tulyakov

We present a novel method to acquire object representations from online image collections, capturing high-quality geometry and material properties of arbitrary objects from photographs with varying cameras, illumination, and backgrounds.

Neural Rendering Novel View Synthesis +1

915

Paper
Code

InOut: Diverse Image Outpainting via GAN Inversion

no code implementations • CVPR 2022 • Yen-Chi Cheng, Chieh Hubert Lin, Hsin-Ying Lee, Jian Ren, Sergey Tulyakov, Ming-Hsuan Yang

Existing image outpainting methods pose the problem as a conditional image-to-image translation task, often generating repetitive structures and textures by replicating the content available in the input image.

Image Outpainting Image-to-Image Translation

Paper
Add Code

StyleGAN-V: A Continuous Video Generator with the Price, Image Quality and Perks of StyleGAN2

1 code implementation • CVPR 2022 • Ivan Skorokhodov, Sergey Tulyakov, Mohamed Elhoseiny

We build our model on top of StyleGAN2 and it is just ${\approx}5\%$ more expensive to train at the same resolution while achieving almost the same image quality.

Video Generation

323

Paper
Code

Flow Guided Transformable Bottleneck Networks for Motion Retargeting

no code implementations • CVPR 2021 • Jian Ren, Menglei Chai, Oliver J. Woodford, Kyle Olszewski, Sergey Tulyakov

Human motion retargeting aims to transfer the motion of one person in a "driving" video or set of images to another person.

Image Generation motion retargeting

Paper
Add Code

A Good Image Generator Is What You Need for High-Resolution Video Synthesis

1 code implementation • ICLR 2021 • Yu Tian, Jian Ren, Menglei Chai, Kyle Olszewski, Xi Peng, Dimitris N. Metaxas, Sergey Tulyakov

We introduce a motion generator that discovers the desired trajectory, in which content and motion are disentangled.

Ranked #32 on Video Generation on UCF-101

Video Generation

237

Paper
Code

Motion Representations for Articulated Animation

2 code implementations • CVPR 2021 • Aliaksandr Siarohin, Oliver J. Woodford, Jian Ren, Menglei Chai, Sergey Tulyakov

To facilitate animation and prevent the leakage of the shape of the driving object, we disentangle shape and pose of objects in the region space.

Ranked #1 on Video Reconstruction on Tai-Chi-HD (512)

Object Video Reconstruction

14,238

Paper
Code

InfinityGAN: Towards Infinite-Pixel Image Synthesis

1 code implementation • ICLR 2022 • Chieh Hubert Lin, Hsin-Ying Lee, Yen-Chi Cheng, Sergey Tulyakov, Ming-Hsuan Yang

We present a novel framework, InfinityGAN, for arbitrary-sized image generation.

Ranked #2 on Scene Generation on OSM

Image Generation Scene Generation

319

Paper
Code

In&Out : Diverse Image Outpainting via GAN Inversion

no code implementations • 1 Apr 2021 • Yen-Chi Cheng, Chieh Hubert Lin, Hsin-Ying Lee, Jian Ren, Sergey Tulyakov, Ming-Hsuan Yang

Image Outpainting Image-to-Image Translation +1

Paper
Add Code

SMIL: Multimodal Learning with Severely Missing Modality

1 code implementation • 9 Mar 2021 • Mengmeng Ma, Jian Ren, Long Zhao, Sergey Tulyakov, Cathy Wu, Xi Peng

A common assumption in multimodal learning is the completeness of training data, i. e., full modalities are available in all training examples.

Meta-Learning

Paper
Code

Teachers Do More Than Teach: Compressing Image-to-Image Models

1 code implementation • CVPR 2021 • Qing Jin, Jian Ren, Oliver J. Woodford, Jiazhuo Wang, Geng Yuan, Yanzhi Wang, Sergey Tulyakov

In this work, we aim to address these issues by introducing a teacher network that provides a search space in which efficient network architectures can be found, in addition to performing knowledge distillation.

Knowledge Distillation

176

Paper
Code

Playable Video Generation

1 code implementation • CVPR 2021 • Willi Menapace, Stéphane Lathuilière, Sergey Tulyakov, Aliaksandr Siarohin, Elisa Ricci

This paper introduces the unsupervised learning problem of playable video generation (PVG).

Decoder Video Generation

150

Paper
Code

MichiGAN: Multi-Input-Conditioned Hair Image Generation for Portrait Editing

1 code implementation • 30 Oct 2020 • Zhentao Tan, Menglei Chai, Dongdong Chen, Jing Liao, Qi Chu, Lu Yuan, Sergey Tulyakov, Nenghai Yu

In this paper, we present MichiGAN (Multi-Input-Conditioned Hair Image GAN), a novel conditional image generation method for interactive portrait hair manipulation.

Conditional Image Generation

291

Paper
Code

Interactive Video Stylization Using Few-Shot Patch-Based Training

2 code implementations • 29 Apr 2020 • Ondřej Texler, David Futschik, Michal Kučera, Ondřej Jamriška, Šárka Sochorová, Menglei Chai, Sergey Tulyakov, Daniel Sýkora

In this paper, we present a learning-based method to the keyframe-based video stylization that allows an artist to propagate the style from a few selected keyframes to the rest of the sequence.

Style Transfer Translation +1

606

Paper
Code

Neural Hair Rendering

no code implementations • ECCV 2020 • Menglei Chai, Jian Ren, Sergey Tulyakov

Unlike existing supervised translation methods that require model-level similarity to preserve consistent structure representation for both real images and fake renderings, our method adopts an unsupervised solution to work on arbitrary hair models.

Translation

Paper
Add Code

Motion-supervised Co-Part Segmentation

2 code implementations • 7 Apr 2020 • Aliaksandr Siarohin, Subhankar Roy, Stéphane Lathuilière, Sergey Tulyakov, Elisa Ricci, Nicu Sebe

To overcome this limitation, we propose a self-supervised deep learning method for co-part segmentation.

Ranked #3 on Unsupervised Human Pose Estimation on Tai-Chi-HD

Segmentation Unsupervised Human Pose Estimation

644

Paper
Code

Human Motion Transfer from Poses in the Wild

no code implementations • 7 Apr 2020 • Jian Ren, Menglei Chai, Sergey Tulyakov, Chen Fang, Xiaohui Shen, Jianchao Yang

In this paper, we tackle the problem of human motion transfer, where we synthesize novel motion video for a target person that imitates the movement from a reference video.

Translation

Paper
Add Code

First Order Motion Model for Image Animation

2 code implementations • NeurIPS 2019 • Aliaksandr Siarohin, Stéphane Lathuilière, Sergey Tulyakov, Elisa Ricci, Nicu Sebe

To achieve this, we decouple appearance and motion information using a self-supervised formulation.

Ranked #1 on Video Reconstruction on Tai-Chi-HD

Image Animation Object +1

14,238

Paper
Code

Task-Assisted Domain Adaptation with Anchor Tasks

no code implementations • 16 Aug 2019 • Zhizhong Li, Linjie Luo, Sergey Tulyakov, Qieyun Dai, Derek Hoiem

Our key idea to improve domain adaptation is to introduce a separate anchor task (such as facial landmarks) whose annotations can be obtained at no cost or are already available on both synthetic and real datasets.

Depth Estimation Domain Adaptation +2

Paper
Add Code

Transformable Bottleneck Networks

1 code implementation • ICCV 2019 • Kyle Olszewski, Sergey Tulyakov, Oliver Woodford, Hao Li, Linjie Luo

We propose a novel approach to performing fine-grained 3D manipulation of image content via a convolutional neural network, which we call the Transformable Bottleneck Network (TBN).

3D Reconstruction Decoder +1

Paper
Code

Train One Get One Free: Partially Supervised Neural Network for Bug Report Duplicate Detection and Clustering

no code implementations • NAACL 2019 • Lahari Poddar, Leonardo Neves, William Brendel, Luis Marujo, Sergey Tulyakov, Pradeep Karuturi

Leveraging the assumption that learning the topic of a bug is a sub-task for detecting duplicates, we design a loss function that can jointly perform both tasks but needs supervision for only duplicate classification, achieving topic clustering in an unsupervised fashion.

Clustering General Classification

Paper
Add Code

3D Guided Fine-Grained Face Manipulation

no code implementations • CVPR 2019 • Zhenglin Geng, Chen Cao, Sergey Tulyakov

This is achieved by first fitting a 3D face model and then disentangling the face into a texture and a shape.

Face Model

Paper
Add Code

Animating Arbitrary Objects via Deep Motion Transfer

1 code implementation • CVPR 2019 • Aliaksandr Siarohin, Stéphane Lathuilière, Sergey Tulyakov, Elisa Ricci, Nicu Sebe

This is achieved through a deep architecture that decouples appearance and motion information.

Image Animation motion prediction +2

463

Paper
Code

Hybrid VAE: Improving Deep Generative Models using Partial Observations

no code implementations • 30 Nov 2017 • Sergey Tulyakov, Andrew Fitzgibbon, Sebastian Nowozin

We show that such a combination is beneficial because the unlabeled data acts as a data-driven form of regularization, allowing generative models trained on few labeled samples to reach the performance of fully-supervised generative models trained on much larger datasets.

Paper
Add Code

MoCoGAN: Decomposing Motion and Content for Video Generation

5 code implementations • CVPR 2018 • Sergey Tulyakov, Ming-Yu Liu, Xiaodong Yang, Jan Kautz

The proposed framework generates a video by mapping a sequence of random vectors to a sequence of video frames.

Ranked #4 on Video Generation on UCF-101 16 frames, Unconditional, Single GPU

Generative Adversarial Network Video Generation

563

Paper
Code

Self-Adaptive Matrix Completion for Heart Rate Estimation From Face Videos Under Realistic Conditions

no code implementations • CVPR 2016 • Sergey Tulyakov, Xavier Alameda-Pineda, Elisa Ricci, Lijun Yin, Jeffrey F. Cohn, Nicu Sebe

Recent studies in computer vision have shown that, while practically invisible to a human observer, skin color changes due to blood flow can be captured on face videos and, surprisingly, be used to estimate the heart rate (HR).

Heart rate estimation Matrix Completion

Paper
Add Code

Regressing a 3D Face Shape From a Single Image

no code implementations • ICCV 2015 • Sergey Tulyakov, Nicu Sebe

To support the ability of our method to reliably reconstruct 3D shapes, we introduce a simple method for head pose estimation using a single image that reaches higher accuracy than the state of the art.

Head Pose Estimation

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.