no code implementations • 29 Apr 2024 • Tianyidan Xie, Rui Ma, Qian Wang, Xiaoqian Ye, Feixuan Liu, Ying Tai, Zhenyu Zhang, Zili Yi
In the image generation module, we employ a text-guided canny-to-image generation model to create a template image based on the edge map of the foreground image and language prompts, and an image refiner to produce the outcome by blending the input foreground and the template image.
no code implementations • 17 Mar 2024 • Ye Wang, Zili Yi, Rui Ma
Personalized text-to-image (T2I) models not only produce lifelike and varied visuals but also allow users to tailor the images to fit their personal taste.
no code implementations • 15 Mar 2024 • Peng Zheng, Tao Liu, Zili Yi, Rui Ma
Notably, SemanticHuman-HD is also the first method to achieve 3D-aware image synthesis at $1024^2$ resolution, benefiting from our proposed 3D-aware super-resolution module.
no code implementations • 31 Jan 2023 • Bingchuan Li, Tianxiang Ma, Peng Zhang, Miao Hua, Wei Liu, Qian He, Zili Yi
Specifically, in Phase I, a W-space-oriented StyleGAN inversion network is trained and used to perform image inversion and editing, which assures the editability but sacrifices reconstruction quality.
no code implementations • CVPR 2022 • Wei Liu, Fangyue Liu, Fei Ding, Qian He, Zili Yi
The cross-modality encoder is pre-trained in a self-supervised manner to allow effective capture of cross- and intra-modality correlations, which facilitates the content-style disentanglement and modeling style representations of all scales (stroke-level, component-level and character-level).
no code implementations • CVPR 2022 • Chao Xu, Jiangning Zhang, Miao Hua, Qian He, Zili Yi, Yong liu
This paper presents a novel Region-Aware Face Swapping (RAFSwap) network to achieve identity-consistent harmonious high-resolution face generation in a local-global manner: \textbf{1)} Local Facial Region-Aware (FRA) branch augments local identity-relevant features by introducing the Transformer to effectively model misaligned cross-scale semantic interaction.
2 code implementations • 1 Mar 2022 • ZiHao Wang, Wei Liu, Qian He, Xinglong Wu, Zili Yi
Once trained, the transformer can generate coherent image tokens based on the text embedding extracted from the text encoder of CLIP upon an input text.
no code implementations • 5 Nov 2021 • Vishnu Sanjay Ramiya Srinivasan, Rui Ma, Qiang Tang, Zili Yi, Zhan Xu
Recent learning-based inpainting algorithms have achieved compelling results for completing missing regions after removing undesired objects in videos.
1 code implementation • 22 Sep 2021 • Miao Hua, Lijie Liu, Ziyang Cheng, Qian He, Bingchuan Li, Zili Yi
Whereas, this technique does not satisfy the requirements of facial parts removal, as it is hard to obtain ``ground-truth'' images with real ``blank'' faces.
1 code implementation • 22 Sep 2021 • Bingchuan Li, Shaofei Cai, Wei Liu, Peng Zhang, Qian He, Miao Hua, Zili Yi
To address these limitations, we design a Dynamic Style Manipulation Network (DyStyle) whose structure and parameters vary by input samples, to perform nonlinear and adaptive manipulation of latent codes for flexible and precise attribute control.
no code implementations • 1 Aug 2020 • Zili Yi, Qiang Tang, Vishnu Sanjay Ramiya Srinivasan, Zhan Xu
It only requires the generator to be trained on small images and can do inference on an image of any size.
6 code implementations • CVPR 2020 • Zili Yi, Qiang Tang, Shekoofeh Azizi, Daesik Jang, Zhan Xu
Since convolutional layers of the neural network only need to operate on low-resolution inputs and outputs, the cost of memory and computing power is thus well suppressed.
Ranked #6 on Image Inpainting on Places2 val
2 code implementations • 22 Mar 2018 • Zili Yi, Zhiqin Chen, Hao Cai, Wendong Mao, Minglun Gong, Hao Zhang
The key feature of BSD-GAN is that it is trained in multiple branches, progressively covering both the breadth and depth of the network, as resolutions of the training images increase to reveal finer-scale features.
6 code implementations • ICCV 2017 • Zili Yi, Hao Zhang, Ping Tan, Minglun Gong
Depending on the task complexity, thousands to millions of labeled image pairs are needed to train a conditional GAN.
Ranked #2 on Image-to-Image Translation on Aerial-to-Map