no code implementations • 2 Apr 2024 • Yuezhou Hu, Kang Zhao, Weiyu Huang, Jianfei Chen, Jun Zhu
Training large Transformers is slow, but recent innovations on GPU architecture gives us an advantage.
no code implementations • 19 Mar 2024 • Haocheng Xi, Yuxiang Chen, Kang Zhao, Kaijun Zheng, Jianfei Chen, Jun Zhu
Moreover, for a standard transformer block, our method offers an end-to-end training speedup of 1. 42x and a 1. 49x memory reduction compared to the FP16 baseline.
no code implementations • 18 Dec 2023 • Dongze Li, Kang Zhao, Wei Wang, Bo Peng, Yingya Zhang, Jing Dong, Tieniu Tan
Audio-driven talking head synthesis is a promising topic with wide applications in digital human, film making and virtual reality.
3 code implementations • 7 Nov 2023 • Shiwei Zhang, Jiayu Wang, Yingya Zhang, Kang Zhao, Hangjie Yuan, Zhiwu Qin, Xiang Wang, Deli Zhao, Jingren Zhou
By this means, I2VGen-XL can simultaneously enhance the semantic accuracy, continuity of details and clarity of generated videos.
1 code implementation • 12 Oct 2023 • Yueming Lyu, Kang Zhao, Bo Peng, Yue Jiang, Yingya Zhang, Jing Dong
Based on DeltaSpace, we propose a novel framework called DeltaEdit, which maps the CLIP visual feature differences to the latent space directions of a generative model during the training phase, and predicts the latent space directions from the CLIP textual feature differences during the inference phase.
no code implementations • 27 Jun 2023 • Tianxiang Ma, Kang Zhao, Jianxin Sun, Yingya Zhang, Jing Dong
Efficiently generating a freestyle 3D portrait with high quality and 3D-consistency is a promising yet challenging task.
no code implementations • 18 Jun 2023 • Kang Zhao, Wei Liu, Jian Luan, Minglei Gao, Li Qian, Hanlin Teng, Bin Wang
In this paper, we propose a Unified framework for Long-term Memory Conversations (UniMC), which increases the connection between different stages by learning relevance representation.
no code implementations • 10 Apr 2023 • Kang Zhao, Jianru Xue, Xiangning Meng, Gengxin Li, Mengsen Wu
One major issue in learning-based model predictive control (MPC) for autonomous driving is the contradiction between the system model's prediction accuracy and computation efficiency.
1 code implementation • CVPR 2023 • Dongze Li, Wei Wang, Kang Zhao, Jing Dong, Tieniu Tan
This work presents RiDDLE, short for Reversible and Diversified De-identification with Latent Encryptor, to protect the identity information of people from being misused.
no code implementations • 4 Jan 2023 • Haojie Yu, Kang Zhao, Xiaoming Xu
To alleviate this issue, inspired by masked autoencoder (MAE), which is a data-efficient self-supervised learner, we propose Semi-MAE, a pure ViT-based SSL framework consisting of a parallel MAE branch to assist the visual representation learning and make the pseudo labels more accurate.
no code implementations • CVPR 2023 • Jiayu Wang, Kang Zhao, Shiwei Zhang, Yingya Zhang, Yujun Shen, Deli Zhao, Jingren Zhou
Generating a talking face video from the input audio sequence is a practical yet challenging task.
1 code implementation • Findings (ACL) 2022 • Kang Zhao, Hua Xu, Jiangong Yang, Kai Gao
Specifically, supervised contrastive learning based on a memory bank is first used to train each new task so that the model can effectively learn the relation representation.
2 code implementations • ACL 2021 • Hanlei Zhang, Xiaoteng Li, Hua Xu, Panpan Zhang, Kang Zhao, Kai Gao
It is composed of two main modules: open intent detection and open intent discovery.
no code implementations • CVPR 2021 • Liuyihan Song, Kang Zhao, Pan Pan, Yu Liu, Yingya Zhang, Yinghui Xu, Rong Jin
Different from all of them, we regard large and small gradients selection as the exploitation and exploration of gradient information, respectively.
1 code implementation • 8 May 2021 • Kang Zhao, Hua Xu, Yue Cheng, Xiaoteng Li, Kai Gao
Joint entity and relation extraction is an essential task in information extraction, which aims to extract all relational triples from unstructured text.
Ranked #2 on Relation Extraction on SemEval-2010 Task-8
no code implementations • 9 Feb 2021 • Yanhao Zhang, Pan Pan, Yun Zheng, Kang Zhao, Yingya Zhang, Xiaofeng Ren, Rong Jin
We hope visual search at Alibaba becomes more widely incorporated into today's commercial applications.
no code implementations • 9 Feb 2021 • Liuyihan Song, Pan Pan, Kang Zhao, Hao Yang, Yiming Chen, Yingya Zhang, Yinghui Xu, Rong Jin
In the last decades, extreme classification has become an essential topic for deep learning.
no code implementations • 9 Feb 2021 • Kang Zhao, Pan Pan, Yun Zheng, Yanhao Zhang, Changxu Wang, Yingya Zhang, Yinghui Xu, Rong Jin
For a deployed visual search system with several billions of online images in total, building a billion-scale offline graph in hours is essential, which is almost unachievable by most existing methods.
no code implementations • 9 Feb 2021 • Kang Zhao, Sida Huang, Pan Pan, Yinghan Li, Yingya Zhang, Zhenyu Gu, Yinghui Xu
Researches have demonstrated that low bit-width (e. g., INT8) quantization can be employed to accelerate the inference process.
no code implementations • 9 Feb 2021 • Yanhao Zhang, Pan Pan, Yun Zheng, Kang Zhao, Jianmin Wu, Yinghui Xu, Rong Jin
Benefiting from exploration of user click data, our networks are more effective to encode richer supervision and better distinguish real-shot images in terms of category and feature.
no code implementations • arXiv 2020 • Kang Zhao, Muhammad Kamran, Gunho Sohn
The proposed deep learning method consists of a two-stage object detection network to produce region of interest (RoI) features and a building boundary extraction network using graph models to learn geometric information of the polygon shapes.
no code implementations • 23 Jun 2020 • Kang Zhao, Muhammad Kamran, Gunho Sohn
The proposed deep learning method consists of a two-stage object detection network to produce region of interest (RoI) features and a building boundary extraction network using graph models to learn geometric information of the polygon shapes.
2 code implementations • 17 Jun 2015 • Michael T. Lash, Kang Zhao
This paper proposes a decision support system to aid movie investment decisions at the early stage of movie productions.