Search Results for author: Utku Evci

Found 22 papers, 12 papers with code

Progressive Gradient Flow for Robust N:M Sparsity Training in Transformers

1 code implementation • 7 Feb 2024 • Abhimanyu Rajeshkumar Bambhaniya, Amir Yazdanbakhsh, Suvinay Subramanian, Sheng-Chun Kao, Shivani Agrawal, Utku Evci, Tushar Krishna

In this work, we study the effectiveness of existing sparse training recipes at \textit{high-sparsity regions} and argue that these methods fail to sustain the model quality on par with low-sparsity regions.

Paper
Code

Scaling Laws for Sparsely-Connected Foundation Models

no code implementations • 15 Sep 2023 • Elias Frantar, Carlos Riquelme, Neil Houlsby, Dan Alistarh, Utku Evci

We explore the impact of parameter sparsity on the scaling behavior of Transformers trained on massive datasets (i. e., "foundation models"), in both vision and language domains.

Computational Efficiency

Paper
Add Code

Dynamic Sparse Training with Structured Sparsity

1 code implementation • 3 May 2023 • Mike Lasby, Anna Golubeva, Utku Evci, Mihai Nica, Yani Ioannou

Dynamic Sparse Training (DST) methods achieve state-of-the-art results in sparse neural network training, matching the generalization of dense models while enabling sparse training and inference.

Paper
Code

JaxPruner: A concise library for sparsity research

1 code implementation • 27 Apr 2023 • Joo Hyung Lee, Wonpyo Park, Nicole Mitchell, Jonathan Pilault, Johan Obando-Ceron, Han-Byul Kim, Namhoon Lee, Elias Frantar, Yun Long, Amir Yazdanbakhsh, Shivani Agrawal, Suvinay Subramanian, Xin Wang, Sheng-Chun Kao, Xingyao Zhang, Trevor Gale, Aart Bik, Woohyun Han, Milen Ferev, Zhonglin Han, Hong-Seok Kim, Yann Dauphin, Gintare Karolina Dziugaite, Pablo Samuel Castro, Utku Evci

This paper introduces JaxPruner, an open-source JAX-based pruning and sparse training library for machine learning research.

195

Paper
Code

The Dormant Neuron Phenomenon in Deep Reinforcement Learning

1 code implementation • 24 Feb 2023 • Ghada Sokar, Rishabh Agarwal, Pablo Samuel Castro, Utku Evci

In this work we identify the dormant neuron phenomenon in deep reinforcement learning, where an agent's network suffers from an increasing number of inactive neurons, thereby affecting network expressivity.

reinforcement-learning Reinforcement Learning (RL)

10,375

Paper
Code

Scaling Vision Transformers to 22 Billion Parameters

1 code implementation • 10 Feb 2023 • Mostafa Dehghani, Josip Djolonga, Basil Mustafa, Piotr Padlewski, Jonathan Heek, Justin Gilmer, Andreas Steiner, Mathilde Caron, Robert Geirhos, Ibrahim Alabdulmohsin, Rodolphe Jenatton, Lucas Beyer, Michael Tschannen, Anurag Arnab, Xiao Wang, Carlos Riquelme, Matthias Minderer, Joan Puigcerver, Utku Evci, Manoj Kumar, Sjoerd van Steenkiste, Gamaleldin F. Elsayed, Aravindh Mahendran, Fisher Yu, Avital Oliver, Fantine Huot, Jasmijn Bastings, Mark Patrick Collier, Alexey Gritsenko, Vighnesh Birodkar, Cristina Vasconcelos, Yi Tay, Thomas Mensink, Alexander Kolesnikov, Filip Pavetić, Dustin Tran, Thomas Kipf, Mario Lučić, Xiaohua Zhai, Daniel Keysers, Jeremiah Harmsen, Neil Houlsby

The scaling of Transformers has driven breakthrough capabilities for language models.

Ranked #1 on Zero-Shot Transfer Image Classification on ObjectNet

Action Classification Fairness +3

192

Paper
Code

Training Recipe for N:M Structured Sparsity with Decaying Pruning Mask

no code implementations • 15 Sep 2022 • Sheng-Chun Kao, Amir Yazdanbakhsh, Suvinay Subramanian, Shivani Agrawal, Utku Evci, Tushar Krishna

In this work, we focus on N:M sparsity and extensively study and evaluate various training recipes for N:M sparsity in terms of the trade-off between model accuracy and compute cost (FLOPs).

Paper
Add Code

The State of Sparse Training in Deep Reinforcement Learning

1 code implementation • 17 Jun 2022 • Laura Graesser, Utku Evci, Erich Elsen, Pablo Samuel Castro

The use of sparse neural networks has seen rapid growth in recent years, particularly in computer vision.

reinforcement-learning Reinforcement Learning (RL)

314

Paper
Code

GradMax: Growing Neural Networks using Gradient Information

1 code implementation • ICLR 2022 • Utku Evci, Bart van Merriënboer, Thomas Unterthiner, Max Vladymyrov, Fabian Pedregosa

The architecture and the parameters of neural networks are often optimized independently, which requires costly retraining of the parameters whenever the architecture is modified.

Paper
Code

Head2Toe: Utilizing Intermediate Representations for Better Transfer Learning

1 code implementation • 10 Jan 2022 • Utku Evci, Vincent Dumoulin, Hugo Larochelle, Michael C. Mozer

We propose a method, Head-to-Toe probing (Head2Toe), that selects features from all layers of the source model to train a classification head for the target-domain.

Transfer Learning

Paper
Code

Head2Toe: Utilizing Intermediate Representations for Better OOD Generalization

no code implementations • 29 Sep 2021 • Utku Evci, Vincent Dumoulin, Hugo Larochelle, Michael Curtis Mozer

We propose a method, Head-to-Toe probing (Head2Toe), that selects features from all layers of the source model to train a classification head for the target-domain.

Transfer Learning

Paper
Add Code

Comparing Transfer and Meta Learning Approaches on a Unified Few-Shot Classification Benchmark

1 code implementation • 6 Apr 2021 • Vincent Dumoulin, Neil Houlsby, Utku Evci, Xiaohua Zhai, Ross Goroshin, Sylvain Gelly, Hugo Larochelle

To bridge this gap, we perform a cross-family study of the best transfer and meta learners on both a large-scale meta-learning benchmark (Meta-Dataset, MD), and a transfer learning benchmark (Visual Task Adaptation Benchmark, VTAB).

Few-Shot Learning General Classification +1

740

Paper
Code

Practical Real Time Recurrent Learning with a Sparse Approximation

no code implementations • ICLR 2021 • Jacob Menick, Erich Elsen, Utku Evci, Simon Osindero, Karen Simonyan, Alex Graves

For highly sparse networks, SnAp with $n=2$ remains tractable and can outperform backpropagation through time in terms of learning speed when updates are done online.

Paper
Add Code

Gradient Flow in Sparse Neural Networks and How Lottery Tickets Win

1 code implementation • 7 Oct 2020 • Utku Evci, Yani A. Ioannou, Cem Keskin, Yann Dauphin

Sparse Neural Networks (NNs) can match the generalization of dense NNs using a fraction of the compute/storage for inference, and also have the potential to enable efficient training.

314

Paper
Code

A Practical Sparse Approximation for Real Time Recurrent Learning

no code implementations • 12 Jun 2020 • Jacob Menick, Erich Elsen, Utku Evci, Simon Osindero, Karen Simonyan, Alex Graves

Current methods for training recurrent neural networks are based on backpropagation through time, which requires storing a complete history of network states, and prohibits updating the weights `online' (after every timestep).

Paper
Add Code

Rigging the Lottery: Making All Tickets Winners

10 code implementations • ICML 2020 • Utku Evci, Trevor Gale, Jacob Menick, Pablo Samuel Castro, Erich Elsen

There is a large body of work on training dense networks to yield sparse networks for inference, but this limits the size of the largest trainable sparse model to that of the largest trainable dense model.

Ranked #1 on Sparse Learning on ImageNet

Image Classification Language Modelling +1

314

Paper
Code

Natural Language Understanding with the Quora Question Pairs Dataset

no code implementations • 1 Jul 2019 • Lakshay Sharma, Laura Graesser, Nikita Nangia, Utku Evci

This paper explores the task Natural Language Understanding (NLU) by looking at duplicate question detection in the Quora dataset.

BIG-bench Machine Learning Natural Language Understanding

Paper
Add Code

The Difficulty of Training Sparse Neural Networks

no code implementations • ICML Workshop Deep_Phenomen 2019 • Utku Evci, Fabian Pedregosa, Aidan Gomez, Erich Elsen

Additionally, our attempts to find a decreasing objective path from "bad" solutions to the "good" ones in the sparse subspace fail.

Paper
Add Code

Mean Replacement Pruning

no code implementations • ICLR 2019 • Utku Evci, Nicolas Le Roux, Pablo Castro, Leon Bottou

Finally, we show that the units selected by the best performing scoring functions are somewhat consistent over the course of training, implying the dead parts of the network appear during the stages of training.

Paper
Add Code

Meta-Dataset: A Dataset of Datasets for Learning to Learn from Few Examples

13 code implementations • ICLR 2020 • Eleni Triantafillou, Tyler Zhu, Vincent Dumoulin, Pascal Lamblin, Utku Evci, Kelvin Xu, Ross Goroshin, Carles Gelada, Kevin Swersky, Pierre-Antoine Manzagol, Hugo Larochelle

Few-shot classification refers to learning a classifier for new classes given only a few examples.

Ranked #7 on Few-Shot Image Classification on Meta-Dataset Rank

Few-Shot Image Classification General Classification +1

740

Paper
Code

Detecting Dead Weights and Units in Neural Networks

no code implementations • 15 Jun 2018 • Utku Evci

We propose an efficient way for detecting dead units and use it to select which units to prune.

Quantization

Paper
Add Code

Empirical Analysis of the Hessian of Over-Parametrized Neural Networks

no code implementations • ICLR 2018 • Levent Sagun, Utku Evci, V. Ugur Guney, Yann Dauphin, Leon Bottou

In particular, we present a case that links the two observations: small and large batch gradient descent appear to converge to different basins of attraction but we show that they are in fact connected through their flat region and so belong to the same basin.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.