Search Results for author: Joel Hestness

Found 20 papers, 8 papers with code

MediSwift: Efficient Sparse Pre-trained Biomedical Language Models

no code implementations • 1 Mar 2024 • Vithursan Thangarasa, Mahmoud Salem, Shreyas Saxena, Kevin Leong, Joel Hestness, Sean Lie

Large language models (LLMs) are typically trained on general source data for various domains, but a recent surge in domain-specific LLMs has shown their potential to outperform general-purpose models in domain-specific tasks (e. g., biomedicine).

Ranked #10 on Question Answering on PubMedQA

Question Answering

Paper
Add Code

Position Interpolation Improves ALiBi Extrapolation

1 code implementation • 18 Oct 2023 • Faisal Al-Khateeb, Nolan Dey, Daria Soboleva, Joel Hestness

Linear position interpolation helps pre-trained models using rotary position embeddings (RoPE) to extrapolate to longer sequence lengths.

Language Modelling Position +1

478

Paper
Code

BTLM-3B-8K: 7B Parameter Performance in a 3B Parameter Model

1 code implementation • 20 Sep 2023 • Nolan Dey, Daria Soboleva, Faisal Al-Khateeb, Bowen Yang, Ribhu Pathria, Hemant Khachane, Shaheer Muhammad, Zhiming, Chen, Robert Myers, Jacob Robert Steeves, Natalia Vassilieva, Marvin Tom, Joel Hestness

BTLM-3B-8K is available under an Apache 2. 0 license on Hugging Face: https://huggingface. co/cerebras/btlm-3b-8k-base.

8k Language Modelling

852

Paper
Code

SlimPajama-DC: Understanding Data Combinations for LLM Training

1 code implementation • 19 Sep 2023 • Zhiqiang Shen, Tianhua Tao, Liqun Ma, Willie Neiswanger, Zhengzhong Liu, Hongyi Wang, Bowen Tan, Joel Hestness, Natalia Vassilieva, Daria Soboleva, Eric Xing

This paper aims to understand the impacts of various data combinations (e. g., web text, Wikipedia, GitHub, books) on the pretraining of large language models using SlimPajama.

852

Paper
Code

Jais and Jais-chat: Arabic-Centric Foundation and Instruction-Tuned Open Generative Large Language Models

no code implementations • 30 Aug 2023 • Neha Sengupta, Sunil Kumar Sahu, Bokang Jia, Satheesh Katipomu, Haonan Li, Fajri Koto, William Marshall, Gurpreet Gosal, Cynthia Liu, Zhiming Chen, Osama Mohammed Afzal, Samta Kamboj, Onkar Pandit, Rahul Pal, Lalit Pradhan, Zain Muhammad Mujahid, Massa Baali, Xudong Han, Sondos Mahmoud Bsharat, Alham Fikri Aji, Zhiqiang Shen, Zhengzhong Liu, Natalia Vassilieva, Joel Hestness, Andy Hock, Andrew Feldman, Jonathan Lee, Andrew Jackson, Hector Xuguang Ren, Preslav Nakov, Timothy Baldwin, Eric Xing

We release two open versions of the model -- the foundation Jais model, and an instruction-tuned Jais-chat variant -- with the aim of promoting research on Arabic LLMs.

Decoder

Paper
Add Code

Cerebras-GPT: Open Compute-Optimal Language Models Trained on the Cerebras Wafer-Scale Cluster

2 code implementations • 6 Apr 2023 • Nolan Dey, Gurpreet Gosal, Zhiming, Chen, Hemant Khachane, William Marshall, Ribhu Pathria, Marvin Tom, Joel Hestness

We study recent research advances that improve large language models through efficient pre-training and scaling, and open datasets and tools.

936

Paper
Code

RevBiFPN: The Fully Reversible Bidirectional Feature Pyramid Network

1 code implementation • 28 Jun 2022 • Vitaliy Chiley, Vithursan Thangarasa, Abhay Gupta, Anshul Samar, Joel Hestness, Dennis Decoste

However, training them requires substantial accelerator memory for saving large, multi-resolution activations.

Ranked #313 on Image Classification on ImageNet (using extra training data)

General Classification Image Classification +2

Paper
Code

Time and the Value of Data

no code implementations • 17 Mar 2022 • Ehsan Valavi, Joel Hestness, Newsha Ardalani, Marco Iansiti

In addition, we argue that increasing the stock of data by including older datasets may, in fact, damage the model's accuracy.

BIG-bench Machine Learning

Paper
Add Code

Time Dependency, Data Flow, and Competitive Advantage

no code implementations • 17 Mar 2022 • Ehsan Valavi, Joel Hestness, Marco Iansiti, Newsha Ardalani, Feng Zhu, Karim R. Lakhani

Relating the text topics to various business areas of interest, we argue that competing in a business area in which data value decays rapidly alters strategies to acquire competitive advantage.

Paper
Add Code

Efficiently Disentangle Causal Representations

1 code implementation • 6 Jan 2022 • Yuanpeng Li, Joel Hestness, Mohamed Elhoseiny, Liang Zhao, Kenneth Church

This paper proposes an efficient approach to learning disentangled representations with causal mechanisms based on the difference of conditional probabilities in original and new distributions.

Paper
Code

Memory Efficient 3D U-Net with Reversible Mobile Inverted Bottlenecks for Brain Tumor Segmentation

no code implementations • 19 Apr 2021 • Mihir Pendse, Vithursan Thangarasa, Vitaliy Chiley, Ryan Holmdahl, Joel Hestness, Dennis Decoste

The inverted residual bottleneck block uses lightweight depthwise separable convolutions to reduce computation by decomposing convolutions into a pointwise convolution and a depthwise convolution.

Brain Tumor Segmentation Tumor Segmentation

Paper
Add Code

Gradient Descent Resists Compositionality

no code implementations • 1 Jan 2021 • Yuanpeng Li, Liang Zhao, Joel Hestness, Kenneth Church, Mohamed Elhoseiny

In this paper, we argue that gradient descent is one of the reasons that make compositionality learning hard during neural network optimization.

Paper
Add Code

Transferability of Compositionality

no code implementations • 1 Jan 2021 • Yuanpeng Li, Liang Zhao, Joel Hestness, Ka Yee Lun, Kenneth Church, Mohamed Elhoseiny

To our best knowledge, this is the first work to focus on the transferability of compositionality, and it is orthogonal to existing efforts of learning compositional representations in training distribution.

Out-of-Distribution Generalization

Paper
Add Code

Pipelined Backpropagation at Scale: Training Large Models without Batches

no code implementations • 25 Mar 2020 • Atli Kosson, Vitaliy Chiley, Abhinav Venigalla, Joel Hestness, Urs Köster

New hardware can substantially increase the speed and efficiency of deep neural network training.

Image Classification Stochastic Optimization

Paper
Add Code

Compositional Generalization for Primitive Substitutions

1 code implementation • IJCNLP 2019 • Yuanpeng Li, Liang Zhao, Jian-Yu Wang, Joel Hestness

Compositional generalization is a basic mechanism in human language learning, but current neural networks lack such ability.

Few-Shot Learning Machine Translation +2

Paper
Code

Beyond Human-Level Accuracy: Computational Challenges in Deep Learning

1 code implementation • 3 Sep 2019 • Joel Hestness, Newsha Ardalani, Greg Diamos

However, recent prior work shows that as dataset sizes grow, DL model accuracy and model size grow predictably.

Paper
Code

Empirically Characterizing Overparameterization Impact on Convergence

no code implementations • ICLR 2019 • Newsha Ardalani, Joel Hestness, Gregory Diamos

A long-held conventional wisdom states that larger models train more slowly when using gradient descent.

Paper
Add Code

A Proposed Hierarchy of Deep Learning Tasks

no code implementations • 27 Sep 2018 • Joel Hestness, Sharan Narang, Newsha Ardalani, Heewoo Jun, Hassan Kianinejad, Md. Mostofa Ali Patwary, Yang Yang, Yanqi Zhou, Gregory Diamos, Kenneth Church

As the pace of deep learning innovation accelerates, it becomes increasingly important to organize the space of problems by relative difficultly.

Paper
Add Code

Deep Learning Scaling is Predictable, Empirically

no code implementations • 1 Dec 2017 • Joel Hestness, Sharan Narang, Newsha Ardalani, Gregory Diamos, Heewoo Jun, Hassan Kianinejad, Md. Mostofa Ali Patwary, Yang Yang, Yanqi Zhou

As DL application domains grow, we would like a deeper understanding of the relationships between training set size, computational scale, and model accuracy improvements to advance the state-of-the-art.

Language Modelling Machine Translation +3

Paper
Add Code

Convolutional Recurrent Neural Networks for Small-Footprint Keyword Spotting

no code implementations • 15 Mar 2017 • Sercan O. Arik, Markus Kliegl, Rewon Child, Joel Hestness, Andrew Gibiansky, Chris Fougner, Ryan Prenger, Adam Coates

Keyword spotting (KWS) constitutes a major component of human-technology interfaces.

Small-Footprint Keyword Spotting speech-recognition +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.