Search Results for author: Sai Qian Zhang

Found 16 papers, 3 papers with code

HQ-DiT: Efficient Diffusion Transformer with FP4 Hybrid Quantization

no code implementations • 30 May 2024 • Wenxuan Liu, Sai Qian Zhang

Diffusion Transformers (DiTs) have recently gained substantial attention in both industrial and academic fields for their superior visual generation capabilities, outperforming traditional diffusion models that use U-Net.

Quantization

Paper
Add Code

DLoRA: Distributed Parameter-Efficient Fine-Tuning Solution for Large Language Model

no code implementations • 8 Apr 2024 • Chao GAO, Sai Qian Zhang

Due to the scale of LLM, PEFT operations are usually executed in the public environment (e. g., cloud server).

Language Modelling Large Language Model

Paper
Add Code

Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey

no code implementations • 21 Mar 2024 • Zeyu Han, Chao GAO, Jinyang Liu, Jeff Zhang, Sai Qian Zhang

In addition to the algorithmic perspective, we overview various real-world system designs to investigate the implementation costs associated with different PEFT algorithms.

Paper
Add Code

BIM: Block-Wise Self-Supervised Learning with Masked Image Modeling

no code implementations • 28 Nov 2023 • YiXuan Luo, Mengye Ren, Sai Qian Zhang

This approach significantly reduces computational costs in comparison with training each DNN backbone individually.

Contrastive Learning Language Modelling +2

Paper
Add Code

CAMEL: Co-Designing AI Models and Embedded DRAMs for Efficient On-Device Learning

no code implementations • 4 May 2023 • Sai Qian Zhang, Thierry Tambe, Nestor Cuevas, Gu-Yeon Wei, David Brooks

To minimize the occurrence of expensive eDRAM refresh operations, it is beneficial to shorten the lifetime of stored data during the training process.

Paper
Add Code

SphereFed: Hyperspherical Federated Learning

no code implementations • 19 Jul 2022 • Xin Dong, Sai Qian Zhang, Ang Li, H. T. Kung

Federated Learning aims at training a global model from multiple decentralized devices (i. e. clients) without exchanging their private local data.

Federated Learning

Paper
Add Code

A Multi-agent Reinforcement Learning Approach for Efficient Client Selection in Federated Learning

no code implementations • 9 Jan 2022 • Sai Qian Zhang, Jieyu Lin, Qi Zhang

Federated learning (FL) is a training technique that enables client devices to jointly learn a shared model by aggregating locally-computed models without exposing their raw data.

Federated Learning Multi-agent Reinforcement Learning +2

Paper
Add Code

FAST: DNN Training Under Variable Precision Block Floating Point with Stochastic Rounding

no code implementations • 28 Oct 2021 • Sai Qian Zhang, Bradley McDanel, H. T. Kung

Block Floating Point (BFP) can efficiently support quantization for Deep Neural Network (DNN) training by providing a wide dynamic range via a shared exponent across a group of values.

Quantization

Paper
Add Code

Efficient Winning Tickets Drawing over Fine-Grained Structured Sparsity

no code implementations • 29 Sep 2021 • Sai Qian Zhang, Bradley McDanel

By leveraging the N:M sparsity constraint, we can identify the unimportant weights across each group of M weights at earlier stages of iterative pruning, which significantly lowers the cost of iterative training compared to conventional unstructured pruning.

Paper
Add Code

Succinct and Robust Multi-Agent Communication With Temporal Message Control

1 code implementation • NeurIPS 2020 • Sai Qian Zhang, Jieyu Lin, Qi Zhang

Recent studies have shown that introducing communication between agents can significantly improve overall performance in cooperative Multi-agent reinforcement learning (MARL).

Multi-agent Reinforcement Learning Reinforcement Learning (RL)

Paper
Code

Term Revealing: Furthering Quantization at Run Time on Quantized DNNs

no code implementations • 13 Jul 2020 • H. T. Kung, Bradley McDanel, Sai Qian Zhang

To perform conversion from binary to SDR, we develop an efficient encoding method called HESE (Hybrid Encoding for Signed Expressions) that can be performed in one pass looking at only two bits at a time.

Quantization

Paper
Add Code

On the Robustness of Cooperative Multi-Agent Reinforcement Learning

1 code implementation • 8 Mar 2020 • Jieyu Lin, Kristina Dzeparoska, Sai Qian Zhang, Alberto Leon-Garcia, Nicolas Papernot

Our results on the StartCraft II multi-agent benchmark demonstrate that c-MARL teams are highly vulnerable to perturbations applied to one of their agent's observations.

Multi-agent Reinforcement Learning reinforcement-learning +1

Paper
Code

RTN: Reparameterized Ternary Network

no code implementations • 4 Dec 2019 • Yuhang Li, Xin Dong, Sai Qian Zhang, Haoli Bai, Yuanpeng Chen, Wei Wang

We first bring up three omitted issues in extremely low-bit networks: the squashing range of quantized values; the gradient vanishing during backpropagation and the unexploited hardware acceleration of ternary networks.

Quantization

Paper
Add Code

Efficient Communication in Multi-Agent Reinforcement Learning via Variance Based Control

2 code implementations • NeurIPS 2019 • Sai Qian Zhang, Qi Zhang, Jieyu Lin

Multi-agent reinforcement learning (MARL) has recently received considerable attention due to its applicability to a wide range of real-world applications.

Multi-agent Reinforcement Learning reinforcement-learning +3

Paper
Code

Full-stack Optimization for Accelerating CNNs with FPGA Validation

no code implementations • 1 May 2019 • Bradley McDanel, Sai Qian Zhang, H. T. Kung, Xin Dong

A highlight of our full-stack approach which attributes to the achieved high energy efficiency is an efficient Selector-Accumulator (SAC) architecture for implementing the multiplier-accumulator (MAC) operation present in any digital CNN hardware.

Paper
Add Code

Packing Sparse Convolutional Neural Networks for Efficient Systolic Array Implementations: Column Combining Under Joint Optimization

no code implementations • 7 Nov 2018 • H. T. Kung, Bradley McDanel, Sai Qian Zhang

We study the effectiveness of this joint optimization for both high utilization and classification accuracy with ASIC and FPGA designs based on efficient bit-serial implementations of multiplier-accumulators.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.