no code implementations • 3 Jan 2024 • Hexiang Hu, Kelvin C. K. Chan, Yu-Chuan Su, Wenhu Chen, Yandong Li, Kihyuk Sohn, Yang Zhao, Xue Ben, Boqing Gong, William Cohen, Ming-Wei Chang, Xuhui Jia
We introduce *multi-modal instruction* for image generation, a task representation articulating a range of generation intents with precision.
no code implementations • 5 Dec 2023 • Hsin-Ping Huang, Yu-Chuan Su, Deqing Sun, Lu Jiang, Xuhui Jia, Yukun Zhu, Ming-Hsuan Yang
To achieve detailed control, we propose a unified framework to jointly inject control signals into the existing text-to-video model.
no code implementations • ICCV 2023 • Yang Zhao, Tingbo Hou, Yu-Chuan Su, Xuhui Jia. Yandong Li, Matthias Grundmann
An authentic face restoration system is becoming increasingly demanding in many computer vision applications, e. g., image enhancement, video communication, and taking portrait.
no code implementations • 27 Apr 2023 • Kangning Liu, Yu-Chuan Su, Wei, Hong, Ruijin Cang, Xuhui Jia
The one-shot talking-head synthesis task aims to animate a source image to another pose and expression, which is dictated by a driving frame.
no code implementations • 15 Apr 2023 • Hsin-Ping Huang, Yu-Chuan Su, Ming-Hsuan Yang
We tackle the long video generation problem, i. e.~generating videos beyond the output length of video generation models.
no code implementations • 14 Apr 2023 • Yu-Chuan Su, Kelvin C. K. Chan, Yandong Li, Yang Zhao, Han Zhang, Boqing Gong, Huisheng Wang, Xuhui Jia
Our approach greatly reduces the overhead for personalized image generation and is more applicable in many potential applications.
no code implementations • 5 Apr 2023 • Xuhui Jia, Yang Zhao, Kelvin C. K. Chan, Yandong Li, Han Zhang, Boqing Gong, Tingbo Hou, Huisheng Wang, Yu-Chuan Su
This paper proposes a method for generating images of customized objects specified by users.
no code implementations • CVPR 2022 • Yang Zhao, Yu-Chuan Su, Chun-Te Chu, Yandong Li, Marius Renn, Yukun Zhu, Changyou Chen, Xuhui Jia
While existing approaches for face restoration make significant progress in generating high-quality faces, they often fail to preserve facial features and cannot authentically reconstruct the faces.
1 code implementation • 26 Apr 2021 • Yu-Chuan Su, Soravit Changpinyo, Xiangning Chen, Sathish Thoppay, Cho-Jui Hsieh, Lior Shapira, Radu Soricut, Hartwig Adam, Matthew Brown, Ming-Hsuan Yang, Boqing Gong
To enable progress on this task, we create a new dataset consisting of 220k human-annotated 2. 5D relationships among 512K objects from 11K images.
no code implementations • 15 Apr 2021 • Yu-Chuan Su, Raviteja Vemulapalli, Ben Weiss, Chun-Te Chu, Philip Andrew Mansfield, Lior Shapira, Colvin Pitts
To address this issue, we propose a deep learning-based approach that provides suggestions to the photographer on how to adjust the camera view before capturing.
no code implementations • CVPR 2019 • Yu-Chuan Su, Kristen Grauman
KTNs efficiently transfer convolution kernels from perspective images to the equirectangular projection of 360{\deg} images.
no code implementations • CVPR 2018 • Yu-Chuan Su, Kristen Grauman
Standard video encoders developed for conventional narrow field-of-view video are widely applied to 360° video as well, with reasonable results.
no code implementations • 12 Dec 2017 • Yu-Chuan Su, Kristen Grauman
Standard video encoders developed for conventional narrow field-of-view video are widely applied to 360{\deg} video as well, with reasonable results.
no code implementations • NeurIPS 2017 • Yu-Chuan Su, Kristen Grauman
While 360{\deg} cameras offer tremendous new possibilities in vision, graphics, and augmented reality, the spherical images they produce make core feature extraction non-trivial.
no code implementations • CVPR 2017 • Yu-Chuan Su, Kristen Grauman
360deg video requires human viewers to actively control "where" to look while watching the video.
no code implementations • 1 Mar 2017 • Yu-Chuan Su, Kristen Grauman
360$^{\circ}$ video requires human viewers to actively control "where" to look while watching the video.
no code implementations • 7 Dec 2016 • Yu-Chuan Su, Dinesh Jayaraman, Kristen Grauman
AutoCam leverages NFOV web video to discriminatively identify space-time "glimpses" of interest at each time instant, and then uses dynamic programming to select optimal human-like camera trajectories.
no code implementations • 4 Apr 2016 • Yu-Chuan Su, Kristen Grauman
In a wearable camera video, we see what the camera wearer sees.
no code implementations • 1 Apr 2016 • Yu-Chuan Su, Kristen Grauman
Current approaches for activity recognition often ignore constraints on computational resources: 1) they rely on extensive feature computation to obtain rich descriptors on all frames, and 2) they assume batch-mode access to the entire test video at once.
no code implementations • 15 Sep 2014 • Yu-Chuan Su, Tzu-Hsuan Chiu, Chun-Yen Yeh, Hsin-Fu Huang, Winston H. Hsu
The same lack-of-training-sample problem limits the usage of deep models on a wide range of computer vision problems where obtaining training data are difficult.