Search Results for author: Thomas Hayes

Found 6 papers, 3 papers with code

Make-An-Animation: Large-Scale Text-conditional 3D Human Motion Generation

no code implementations • ICCV 2023 • Samaneh Azadi, Akbar Shah, Thomas Hayes, Devi Parikh, Sonal Gupta

However, existing approaches are limited by their reliance on relatively small-scale motion capture data, leading to poor performance on more diverse, in-the-wild prompts.

Ranked #20 on Motion Synthesis on HumanML3D

Motion Synthesis Text-to-Video Generation +1

Paper
Add Code

Text-Conditional Contextualized Avatars For Zero-Shot Personalization

no code implementations • 14 Apr 2023 • Samaneh Azadi, Thomas Hayes, Akbar Shah, Guan Pang, Devi Parikh, Sonal Gupta

Recent large-scale text-to-image generation models have made significant improvements in the quality, realism, and diversity of the synthesized images and enable users to control the created content through language.

Text to 3D Text-to-Image Generation

Paper
Add Code

SpaText: Spatio-Textual Representation for Controllable Image Generation

no code implementations • CVPR 2023 • Omri Avrahami, Thomas Hayes, Oran Gafni, Sonal Gupta, Yaniv Taigman, Devi Parikh, Dani Lischinski, Ohad Fried, Xi Yin

Due to lack of large-scale datasets that have a detailed textual description for each region in the image, we choose to leverage the current large-scale text-to-image datasets and base our approach on a novel CLIP-based spatio-textual representation, and show its effectiveness on two state-of-the-art diffusion models: pixel-based and latent-based.

Text-to-Image Generation

Paper
Add Code

Make-A-Video: Text-to-Video Generation without Text-Video Data

2 code implementations • 29 Sep 2022 • Uriel Singer, Adam Polyak, Thomas Hayes, Xi Yin, Jie An, Songyang Zhang, Qiyuan Hu, Harry Yang, Oron Ashual, Oran Gafni, Devi Parikh, Sonal Gupta, Yaniv Taigman

We propose Make-A-Video -- an approach for directly translating the tremendous recent progress in Text-to-Image (T2I) generation to Text-to-Video (T2V).

Ranked #3 on Text-to-Video Generation on MSR-VTT (CLIP-FID metric)