1 code implementation • 8 Apr 2024 • Neelesh Gupta, Narayanan Kannan, Pengmiao Zhang, Viktor Prasanna
TabConv preserves over 93% of the original model's performance while reducing arithmetic operations by 36. 5%, 25. 8%, and 99. 4% for ResNet-18 on CIFAR-10, CIFAR-100, and MNIST, respectively, 35. 6% and 99. 3% for ResNet-34 on CIFAR-10 and MNIST, and 98. 9% for NIN on MNIST, achieving low-computation inference.
1 code implementation • 21 Feb 2024 • Neelesh Gupta, Pengmiao Zhang, Rajgopal Kannan, Viktor Prasanna
Deep neural networks (DNNs) have proven to be effective models for accurate Memory Access Prediction (MAP), a critical task in mitigating memory latency through data prefetching.
no code implementations • 15 Feb 2024 • Kyle Marino, Pengmiao Zhang, Viktor Prasanna
We evaluate ME-ViT on systolic array sizes of 32 and 16, achieving up to a 9. 22$\times$ and 17. 89$\times$ overall improvement in memory bandwidth, and a 2. 16$\times$ improvement in throughput per DSP for both designs over state-of-the-art ViT accelerators on FPGA.
no code implementations • 23 Dec 2023 • Pengmiao Zhang, Neelesh Gupta, Rajgopal Kannan, Viktor K. Prasanna
DART accelerates the large model inference by 170x and the distilled model by 9. 4x.
no code implementations • 10 Dec 2022 • Pengmiao Zhang, Rajgopal Kannan, Viktor K. Prasanna
Our predictors achieve 6. 80-16. 02% higher F1-score for delta and 11. 68-15. 41% higher accuracy-at-10 for page prediction compared with LSTM and vanilla attention models.
no code implementations • 29 May 2022 • Pengmiao Zhang, Ajitesh Srivastava, Anant V. Nori, Rajgopal Kannan, Viktor K. Prasanna
Data Prefetching is a technique that can hide memory latency by fetching data before it is needed by a program.
1 code implementation • 1 May 2022 • Pengmiao Zhang, Ajitesh Srivastava, Anant V. Nori, Rajgopal Kannan, Viktor K. Prasanna
To reduce vocabulary size, we use fine-grained address segmentation as input.