Search Results for author: Kang Zhao

Found 23 papers, 7 papers with code

Accelerating Transformer Pre-Training with 2:4 Sparsity

no code implementations • 2 Apr 2024 • Yuezhou Hu, Kang Zhao, Weiyu Huang, Jianfei Chen, Jun Zhu

Training large Transformers is slow, but recent innovations on GPU architecture gives us an advantage.

Paper
Add Code

Jetfire: Efficient and Accurate Transformer Pretraining with INT8 Data Flow and Per-Block Quantization

no code implementations • 19 Mar 2024 • Haocheng Xi, Yuxiang Chen, Kang Zhao, Kaijun Zheng, Jianfei Chen, Jun Zhu

Moreover, for a standard transformer block, our method offers an end-to-end training speedup of 1. 42x and a 1. 49x memory reduction compared to the FP16 baseline.

Quantization

Paper
Add Code

AE-NeRF: Audio Enhanced Neural Radiance Field for Few Shot Talking Head Synthesis

no code implementations • 18 Dec 2023 • Dongze Li, Kang Zhao, Wei Wang, Bo Peng, Yingya Zhang, Jing Dong, Tieniu Tan

Audio-driven talking head synthesis is a promising topic with wide applications in digital human, film making and virtual reality.

Talking Head Generation

Paper
Add Code

I2VGen-XL: High-Quality Image-to-Video Synthesis via Cascaded Diffusion Models

3 code implementations • 7 Nov 2023 • Shiwei Zhang, Jiayu Wang, Yingya Zhang, Kang Zhao, Hangjie Yuan, Zhiwu Qin, Xiang Wang, Deli Zhao, Jingren Zhou

By this means, I2VGen-XL can simultaneously enhance the semantic accuracy, continuity of details and clarity of generated videos.

6,135

Paper
Code

DeltaSpace: A Semantic-aligned Feature Space for Flexible Text-guided Image Editing

1 code implementation • 12 Oct 2023 • Yueming Lyu, Kang Zhao, Bo Peng, Yue Jiang, Yingya Zhang, Jing Dong

Based on DeltaSpace, we propose a novel framework called DeltaEdit, which maps the CLIP visual feature differences to the latent space directions of a generative model during the training phase, and predicts the latent space directions from the CLIP textual feature differences during the inference phase.

text-guided-image-editing

Paper
Code

Freestyle 3D-Aware Portrait Synthesis Based on Compositional Generative Priors

no code implementations • 27 Jun 2023 • Tianxiang Ma, Kang Zhao, Jianxin Sun, Yingya Zhang, Jing Dong

Efficiently generating a freestyle 3D portrait with high quality and 3D-consistency is a promising yet challenging task.

Paper
Add Code

UniMC: A Unified Framework for Long-Term Memory Conversation via Relevance Representation Learning

no code implementations • 18 Jun 2023 • Kang Zhao, Wei Liu, Jian Luan, Minglei Gao, Li Qian, Hanlin Teng, Bin Wang

In this paper, we propose a Unified framework for Long-term Memory Conversations (UniMC), which increases the connection between different stages by learning relevance representation.

Decoder Representation Learning +1

Paper
Add Code

Learning Residual Model of Model Predictive Control via Random Forests for Autonomous Driving

no code implementations • 10 Apr 2023 • Kang Zhao, Jianru Xue, Xiangning Meng, Gengxin Li, Mengsen Wu

One major issue in learning-based model predictive control (MPC) for autonomous driving is the contradiction between the system model's prediction accuracy and computation efficiency.

Autonomous Driving Model Predictive Control +1

Paper
Add Code

RiDDLE: Reversible and Diversified De-identification with Latent Encryptor

1 code implementation • CVPR 2023 • Dongze Li, Wei Wang, Kang Zhao, Jing Dong, Tieniu Tan

This work presents RiDDLE, short for Reversible and Diversified De-identification with Latent Encryptor, to protect the identity information of people from being misused.

De-identification

Paper
Code

Semi-MAE: Masked Autoencoders for Semi-supervised Vision Transformers

no code implementations • 4 Jan 2023 • Haojie Yu, Kang Zhao, Xiaoming Xu

To alleviate this issue, inspired by masked autoencoder (MAE), which is a data-efficient self-supervised learner, we propose Semi-MAE, a pure ViT-based SSL framework consisting of a parallel MAE branch to assist the visual representation learning and make the pseudo labels more accurate.

Decoder Representation Learning +1

Paper
Add Code

LipFormer: High-Fidelity and Generalizable Talking Face Generation With a Pre-Learned Facial Codebook

no code implementations • CVPR 2023 • Jiayu Wang, Kang Zhao, Shiwei Zhang, Yingya Zhang, Yujun Shen, Deli Zhao, Jingren Zhou

Generating a talking face video from the input audio sequence is a practical yet challenging task.

Talking Face Generation

Paper
Add Code

Consistent Representation Learning for Continual Relation Extraction

1 code implementation • Findings (ACL) 2022 • Kang Zhao, Hua Xu, Jiangong Yang, Kai Gao

Specifically, supervised contrastive learning based on a memory bank is first used to train each new task so that the model can effectively learn the relation representation.

Continual Relation Extraction Contrastive Learning +3

Paper
Code

TEXTOIR: An Integrated and Visualized Platform for Text Open Intent Recognition

2 code implementations • ACL 2021 • Hanlei Zhang, Xiaoteng Li, Hua Xu, Panpan Zhang, Kang Zhao, Kai Gao

It is composed of two main modules: open intent detection and open intent discovery.

Intent Discovery Intent Recognition +3

181

Paper
Code

Communication Efficient SGD via Gradient Sampling With Bayes Prior

no code implementations • CVPR 2021 • Liuyihan Song, Kang Zhao, Pan Pan, Yu Liu, Yingya Zhang, Yinghui Xu, Rong Jin

Different from all of them, we regard large and small gradients selection as the exploitation and exploration of gradient information, respectively.

Image Classification object-detection +2

Paper
Add Code

Representation Iterative Fusion based on Heterogeneous Graph Neural Network for Joint Entity and Relation Extraction

1 code implementation • 8 May 2021 • Kang Zhao, Hua Xu, Yue Cheng, Xiaoteng Li, Kai Gao

Joint entity and relation extraction is an essential task in information extraction, which aims to extract all relational triples from unstructured text.

Ranked #2 on Relation Extraction on SemEval-2010 Task-8

Joint Entity and Relation Extraction Relation +2

Paper
Code

Visual Search at Alibaba

no code implementations • 9 Feb 2021 • Yanhao Zhang, Pan Pan, Yun Zheng, Kang Zhao, Yingya Zhang, Xiaofeng Ren, Rong Jin

We hope visual search at Alibaba becomes more widely incorporated into today's commercial applications.

Image Retrieval

Paper
Add Code

Large-Scale Training System for 100-Million Classification at Alibaba

no code implementations • 9 Feb 2021 • Liuyihan Song, Pan Pan, Kang Zhao, Hao Yang, Yiming Chen, Yingya Zhang, Yinghui Xu, Rong Jin

In the last decades, extreme classification has become an essential topic for deep learning.

Classification General Classification

Paper
Add Code

Large-Scale Visual Search with Binary Distributed Graph at Alibaba

no code implementations • 9 Feb 2021 • Kang Zhao, Pan Pan, Yun Zheng, Yanhao Zhang, Changxu Wang, Yingya Zhang, Yinghui Xu, Rong Jin

For a deployed visual search system with several billions of online images in total, building a billion-scale offline graph in hours is essential, which is almost unachievable by most existing methods.

graph construction

Paper
Add Code

Distribution Adaptive INT8 Quantization for Training CNNs

no code implementations • 9 Feb 2021 • Kang Zhao, Sida Huang, Pan Pan, Yinghan Li, Yingya Zhang, Zhenyu Gu, Yinghui Xu

Researches have demonstrated that low bit-width (e. g., INT8) quantization can be employed to accelerate the inference process.

Image Classification object-detection +3

Paper
Add Code

Virtual ID Discovery from E-commerce Media at Alibaba: Exploiting Richness of User Click Behavior for Visual Search Relevance

no code implementations • 9 Feb 2021 • Yanhao Zhang, Pan Pan, Yun Zheng, Kang Zhao, Jianmin Wu, Yinghui Xu, Rong Jin

Benefiting from exploration of user click data, our networks are more effective to encode richer supervision and better distinguish real-shot images in terms of category and feature.

Paper
Add Code

BOUNDARY REGULARIZED BUILDING FOOTPRINT EXTRACTION FROM SATELLITE IMAGES USING DEEP NEURAL NETWORKS

no code implementations • arXiv 2020 • Kang Zhao, Muhammad Kamran, Gunho Sohn

The proposed deep learning method consists of a two-stage object detection network to produce region of interest (RoI) features and a building boundary extraction network using graph models to learn geometric information of the polygon shapes.

Object object-detection +2

Paper
Add Code

Boundary Regularized Building Footprint Extraction From Satellite Images Using Deep Neural Network

no code implementations • 23 Jun 2020 • Kang Zhao, Muhammad Kamran, Gunho Sohn

Object object-detection +2

Paper
Add Code

Early Predictions of Movie Success: the Who, What, and When of Profitability

2 code implementations • 17 Jun 2015 • Michael T. Lash, Kang Zhao

This paper proposes a decision support system to aid movie investment decisions at the early stage of movie productions.

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.