Search Results for author: Cliff Young

Found 9 papers, 5 papers with code

TPU v4: An Optically Reconfigurable Supercomputer for Machine Learning with Hardware Support for Embeddings

no code implementations • 4 Apr 2023 • Norman P. Jouppi, George Kurian, Sheng Li, Peter Ma, Rahul Nagarajan, Lifeng Nai, Nishant Patil, Suvinay Subramanian, Andy Swing, Brian Towles, Cliff Young, Xiang Zhou, Zongwei Zhou, David Patterson

For similar sized systems, it is ~4. 3x-4. 5x faster than the Graphcore IPU Bow and is 1. 2x-1. 7x faster and uses 1. 3x-1. 9x less power than the Nvidia A100.

Paper
Add Code

MegaBlocks: Efficient Sparse Training with Mixture-of-Experts

2 code implementations • 29 Nov 2022 • Trevor Gale, Deepak Narayanan, Cliff Young, Matei Zaharia

We present MegaBlocks, a system for efficient Mixture-of-Experts (MoE) training on GPUs.

1,063

Paper
Code

Exploring the limits of Concurrency in ML Training on Google TPUs

no code implementations • 7 Nov 2020 • Sameer Kumar, James Bradbury, Cliff Young, Yu Emma Wang, Anselm Levskaya, Blake Hechtman, Dehao Chen, HyoukJoong Lee, Mehmet Deveci, Naveen Kumar, Pankaj Kanwar, Shibo Wang, Skye Wanderman-Milne, Steve Lacy, Tao Wang, Tayo Oguntebi, Yazhou Zu, Yuanzhong Xu, Andy Swing

Recent results in language understanding using neural networks have required training hardware of unprecedentedscale, with thousands of chips cooperating on a single training run.

Paper
Add Code

Sparse GPU Kernels for Deep Learning

1 code implementation • 18 Jun 2020 • Trevor Gale, Matei Zaharia, Cliff Young, Erich Elsen

In this work, we study sparse matrices from deep learning applications and identify favorable properties that can be exploited to accelerate computation.

230

Paper
Code

Bit-Parallel Vector Composability for Neural Acceleration

no code implementations • 11 Apr 2020 • Soroush Ghodrati, Hardik Sharma, Cliff Young, Nam Sung Kim, Hadi Esmaeilzadeh

This paper explores a different design style, where each unit is only responsible for a slice of the bit-level operations to interleave and combine the benefits of bit-level parallelism with the abundant data-level parallelism in deep neural networks.

Paper
Add Code

MLPerf Training Benchmark

2 code implementations • 2 Oct 2019 • Peter Mattson, Christine Cheng, Cody Coleman, Greg Diamos, Paulius Micikevicius, David Patterson, Hanlin Tang, Gu-Yeon Wei, Peter Bailis, Victor Bittorf, David Brooks, Dehao Chen, Debojyoti Dutta, Udit Gupta, Kim Hazelwood, Andrew Hock, Xinyuan Huang, Atsushi Ike, Bill Jia, Daniel Kang, David Kanter, Naveen Kumar, Jeffery Liao, Guokai Ma, Deepak Narayanan, Tayo Oguntebi, Gennady Pekhimenko, Lillian Pentecost, Vijay Janapa Reddi, Taylor Robie, Tom St. John, Tsuguchika Tabaru, Carole-Jean Wu, Lingjie Xu, Masafumi Yamazaki, Cliff Young, Matei Zaharia

Machine learning (ML) needs industry-standard performance benchmarks to support design and competitive evaluation of the many emerging software and hardware solutions for ML.

Benchmarking BIG-bench Machine Learning

1,558

Paper
Code

Mesh-TensorFlow: Deep Learning for Supercomputers

1 code implementation • NeurIPS 2018 • Noam Shazeer, Youlong Cheng, Niki Parmar, Dustin Tran, Ashish Vaswani, Penporn Koanantakool, Peter Hawkins, HyoukJoong Lee, Mingsheng Hong, Cliff Young, Ryan Sepassi, Blake Hechtman

We use Mesh-TensorFlow to implement an efficient data-parallel, model-parallel version of the Transformer sequence-to-sequence model.

Ranked #10 on Language Modelling on One Billion Word

Language Modelling

1,557

Paper
Code

In-Datacenter Performance Analysis of a Tensor Processing Unit

no code implementations • 16 Apr 2017 • Norman P. Jouppi, Cliff Young, Nishant Patil, David Patterson, Gaurav Agrawal, Raminder Bajwa, Sarah Bates, Suresh Bhatia, Nan Boden, Al Borchers, Rick Boyle, Pierre-luc Cantin, Clifford Chao, Chris Clark, Jeremy Coriell, Mike Daley, Matt Dau, Jeffrey Dean, Ben Gelb, Tara Vazir Ghaemmaghami, Rajendra Gottipati, William Gulland, Robert Hagmann, C. Richard Ho, Doug Hogberg, John Hu, Robert Hundt, Dan Hurt, Julian Ibarz, Aaron Jaffey, Alek Jaworski, Alexander Kaplan, Harshit Khaitan, Andy Koch, Naveen Kumar, Steve Lacy, James Laudon, James Law, Diemthu Le, Chris Leary, Zhuyuan Liu, Kyle Lucke, Alan Lundin, Gordon MacKean, Adriana Maggiore, Maire Mahony, Kieran Miller, Rahul Nagarajan, Ravi Narayanaswami, Ray Ni, Kathy Nix, Thomas Norrie, Mark Omernick, Narayana Penukonda, Andy Phelps, Jonathan Ross, Matt Ross, Amir Salek, Emad Samadiani, Chris Severn, Gregory Sizikov, Matthew Snelham, Jed Souter, Dan Steinberg, Andy Swing, Mercedes Tan, Gregory Thorson, Bo Tian, Horia Toma, Erick Tuttle, Vijay Vasudevan, Richard Walter, Walter Wang, Eric Wilcox, Doe Hyun Yoon

Our workload, written in the high-level TensorFlow framework, uses production NN applications (MLPs, CNNs, and LSTMs) that represent 95% of our datacenters' NN inference demand.

Paper
Add Code

Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation

28 code implementations • 26 Sep 2016 • Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V. Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, Jeff Klingner, Apurva Shah, Melvin Johnson, Xiaobing Liu, Łukasz Kaiser, Stephan Gouws, Yoshikiyo Kato, Taku Kudo, Hideto Kazawa, Keith Stevens, George Kurian, Nishant Patil, Wei Wang, Cliff Young, Jason Smith, Jason Riesa, Alex Rudnick, Oriol Vinyals, Greg Corrado, Macduff Hughes, Jeffrey Dean

To improve parallelism and therefore decrease training time, our attention mechanism connects the bottom layer of the decoder to the top layer of the encoder.

Ranked #35 on Machine Translation on WMT2014 English-French

Decoder Machine Translation +3

12,669

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.