no code implementations • 29 Apr 2024 • Donggyun Kim, Seongwoong Cho, Semin Kim, Chong Luo, Seunghoon Hong
In this study, we explore a universal model that can flexibly adapt to unseen dense label structures with a few examples, enabling it to serve as a data-efficient vision generalist in diverse real-world scenarios.
no code implementations • 3 Jan 2024 • Minchan Kim, Myeonghun Jeong, Byoung Jin Choi, Semin Kim, Joun Yeop Lee, Nam Soo Kim
We also delve into the inference speed and prosody control capabilities of our approach, highlighting the potential of neural transducers in TTS frameworks.
1 code implementation • CVPR 2023 • Jaehoon Yoo, Semin Kim, Doyup Lee, Chiheon Kim, Seunghoon Hong
However, the transformers are prohibited from directly learning the long-term dependency in videos due to the quadratic complexity of self-attention, and inherently suffering from slow inference time and error propagation due to the autoregressive process.
Ranked #24 on Video Generation on UCF-101