Search Results for author: Pengmiao Zhang

Found 7 papers, 3 papers with code

TabConv: Low-Computation CNN Inference via Table Lookups

1 code implementation • 8 Apr 2024 • Neelesh Gupta, Narayanan Kannan, Pengmiao Zhang, Viktor Prasanna

TabConv preserves over 93% of the original model's performance while reducing arithmetic operations by 36. 5%, 25. 8%, and 99. 4% for ResNet-18 on CIFAR-10, CIFAR-100, and MNIST, respectively, 35. 6% and 99. 3% for ResNet-34 on CIFAR-10 and MNIST, and 98. 9% for NIN on MNIST, achieving low-computation inference.

Paper
Code

PaCKD: Pattern-Clustered Knowledge Distillation for Compressing Memory Access Prediction Models

1 code implementation • 21 Feb 2024 • Neelesh Gupta, Pengmiao Zhang, Rajgopal Kannan, Viktor Prasanna

Deep neural networks (DNNs) have proven to be effective models for accurate Memory Access Prediction (MAP), a critical task in mitigating memory latency through data prefetching.

Image Classification Knowledge Distillation

Paper
Code

ME-ViT: A Single-Load Memory-Efficient FPGA Accelerator for Vision Transformers

no code implementations • 15 Feb 2024 • Kyle Marino, Pengmiao Zhang, Viktor Prasanna

We evaluate ME-ViT on systolic array sizes of 32 and 16, achieving up to a 9. 22$\times$ and 17. 89$\times$ overall improvement in memory bandwidth, and a 2. 16$\times$ improvement in throughput per DSP for both designs over state-of-the-art ViT accelerators on FPGA.

Paper
Add Code

Attention, Distillation, and Tabularization: Towards Practical Neural Network-Based Prefetching

no code implementations • 23 Dec 2023 • Pengmiao Zhang, Neelesh Gupta, Rajgopal Kannan, Viktor K. Prasanna

DART accelerates the large model inference by 170x and the distilled model by 9. 4x.

Paper
Add Code

Phases, Modalities, Temporal and Spatial Locality: Domain Specific ML Prefetcher for Accelerating Graph Analytics

no code implementations • 10 Dec 2022 • Pengmiao Zhang, Rajgopal Kannan, Viktor K. Prasanna

Our predictors achieve 6. 80-16. 02% higher F1-score for delta and 11. 68-15. 41% higher accuracy-at-10 for page prediction compared with LSTM and vanilla attention models.

Paper
Add Code

TransforMAP: Transformer for Memory Access Prediction

no code implementations • 29 May 2022 • Pengmiao Zhang, Ajitesh Srivastava, Anant V. Nori, Rajgopal Kannan, Viktor K. Prasanna

Data Prefetching is a technique that can hide memory latency by fetching data before it is needed by a program.

Paper
Add Code

Fine-Grained Address Segmentation for Attention-Based Variable-Degree Prefetching

1 code implementation • 1 May 2022 • Pengmiao Zhang, Ajitesh Srivastava, Anant V. Nori, Rajgopal Kannan, Viktor K. Prasanna

To reduce vocabulary size, we use fine-grained address segmentation as input.

Segmentation

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.