1 code implementation • NeurIPS 2023 • Phitchaya Mangpo Phothilimthana, Sami Abu-El-Haija, Kaidi Cao, Bahare Fatemi, Mike Burrows, Charith Mendis, Bryan Perozzi
TpuGraphs provides 25x more graphs than the largest graph property prediction dataset (with comparable graph sizes), and 770x larger graphs on average compared to existing performance prediction datasets on machine learning programs.
no code implementations • 27 Jun 2023 • Ahan Gupta, Yueming Yuan, Yanqi Zhou, Charith Mendis
FLuRKA provide sizable performance gains over these approximate techniques and are of high quality.
no code implementations • 27 Jun 2023 • Damitha Lenadora, Vimarsh Sathia, Gerasimos Gerogiannis, Serif Yesil, Josep Torrellas, Charith Mendis
We leverage this observation to propose SENSEi, a system that exposes different sparse and dense matrix primitive compositions based on different matrix re-associations of GNN computations and selects the best among them based on input attributes.
1 code implementation • NeurIPS 2023 • Kaidi Cao, Phitchaya Mangpo Phothilimthana, Sami Abu-El-Haija, Dustin Zelle, Yanqi Zhou, Charith Mendis, Jure Leskovec, Bryan Perozzi
Here we propose Graph Segment Training (GST), a general framework that utilizes a divide-and-conquer approach to allow learning large graph property prediction with a constant memory footprint.
1 code implementation • 14 Feb 2023 • Isha Chaudhary, Alex Renda, Charith Mendis, Gagandeep Singh
We generate and compare COMET's explanations for the popular neural cost model, Ithemal against those for an accurate CPU simulation-based cost model, uiCA.
1 code implementation • 8 Oct 2022 • Ondrej Sykora, Phitchaya Mangpo Phothilimthana, Charith Mendis, Amir Yazdanbakhsh
In this paper, we introduce GRANITE, a new machine learning model that estimates the throughput of basic blocks across different microarchitectures.
2 code implementations • 8 Oct 2020 • Alex Renda, Yishen Chen, Charith Mendis, Michael Carbin
In this paper we present DiffTune, a system for learning the parameters of x86 basic block CPU simulators from coarse-grained end-to-end measurements.
no code implementations • 3 Aug 2020 • Samuel J. Kaufman, Phitchaya Mangpo Phothilimthana, Yanqi Zhou, Charith Mendis, Sudip Roy, Amit Sabne, Mike Burrows
Accurate hardware performance models are critical to efficient code generation.
1 code implementation • NeurIPS 2019 • Charith Mendis, Cambridge Yang, Yewen Pu, Dr.Saman Amarasinghe, Michael Carbin
We show that the learnt policy produces a vectorization scheme which is better than industry standard compiler heuristics both in terms of static measures and runtime performance.
3 code implementations • 21 Aug 2018 • Charith Mendis, Alex Renda, Saman Amarasinghe, Michael Carbin
Predicting the number of clock cycles a processor takes to execute a block of assembly instructions in steady state (the throughput) is important for both compiler designers and performance engineers.