1 code implementation • 5 Mar 2020 • Peng Zhang, Jianbin Fang, Canqun Yang, Chun Huang, Tao Tang, Zheng Wang
This article presents an automatic approach to quickly derive a good solution for hardware resource partition and task granularity for task-based parallel applications on heterogeneous many-core architectures.
no code implementations • 20 Nov 2019 • Donglin Chen, Jianbin Fang, Chuanfu Xu, Shizhao Chen, Zheng Wang
Understanding the scalability of parallel programs is crucial for software optimization and hardware architecture design.
no code implementations • 21 Oct 2018 • Qing Qin, Jie Ren, Jialong Yu, Ling Gao, Hai Wang, Jie Zheng, Yansong Feng, Jianbin Fang, Zheng Wang
We experimentally show that how two mainstream compression techniques, data quantization and pruning, perform on these network architectures and the implications of compression techniques to the model storage size, inference time, energy consumption and performance metrics.
no code implementations • 8 Feb 2018 • Peng Zhang, Jianbin Fang, Tao Tang, Canqun Yang, Zheng Wang
In this paper, we present an automatic approach to determine the hardware resource partition and the task granularity for any given application, targeting the Intel XeonPhi architecture.
Performance