no code implementations • 1 Mar 2024 • Vithursan Thangarasa, Mahmoud Salem, Shreyas Saxena, Kevin Leong, Joel Hestness, Sean Lie
Large language models (LLMs) are typically trained on general source data for various domains, but a recent surge in domain-specific LLMs has shown their potential to outperform general-purpose models in domain-specific tasks (e. g., biomedicine).
Ranked #10 on Question Answering on PubMedQA
1 code implementation • 18 Oct 2023 • Faisal Al-Khateeb, Nolan Dey, Daria Soboleva, Joel Hestness
Linear position interpolation helps pre-trained models using rotary position embeddings (RoPE) to extrapolate to longer sequence lengths.
1 code implementation • 20 Sep 2023 • Nolan Dey, Daria Soboleva, Faisal Al-Khateeb, Bowen Yang, Ribhu Pathria, Hemant Khachane, Shaheer Muhammad, Zhiming, Chen, Robert Myers, Jacob Robert Steeves, Natalia Vassilieva, Marvin Tom, Joel Hestness
BTLM-3B-8K is available under an Apache 2. 0 license on Hugging Face: https://huggingface. co/cerebras/btlm-3b-8k-base.
1 code implementation • 19 Sep 2023 • Zhiqiang Shen, Tianhua Tao, Liqun Ma, Willie Neiswanger, Zhengzhong Liu, Hongyi Wang, Bowen Tan, Joel Hestness, Natalia Vassilieva, Daria Soboleva, Eric Xing
This paper aims to understand the impacts of various data combinations (e. g., web text, Wikipedia, GitHub, books) on the pretraining of large language models using SlimPajama.
no code implementations • 30 Aug 2023 • Neha Sengupta, Sunil Kumar Sahu, Bokang Jia, Satheesh Katipomu, Haonan Li, Fajri Koto, William Marshall, Gurpreet Gosal, Cynthia Liu, Zhiming Chen, Osama Mohammed Afzal, Samta Kamboj, Onkar Pandit, Rahul Pal, Lalit Pradhan, Zain Muhammad Mujahid, Massa Baali, Xudong Han, Sondos Mahmoud Bsharat, Alham Fikri Aji, Zhiqiang Shen, Zhengzhong Liu, Natalia Vassilieva, Joel Hestness, Andy Hock, Andrew Feldman, Jonathan Lee, Andrew Jackson, Hector Xuguang Ren, Preslav Nakov, Timothy Baldwin, Eric Xing
We release two open versions of the model -- the foundation Jais model, and an instruction-tuned Jais-chat variant -- with the aim of promoting research on Arabic LLMs.
2 code implementations • 6 Apr 2023 • Nolan Dey, Gurpreet Gosal, Zhiming, Chen, Hemant Khachane, William Marshall, Ribhu Pathria, Marvin Tom, Joel Hestness
We study recent research advances that improve large language models through efficient pre-training and scaling, and open datasets and tools.
1 code implementation • 28 Jun 2022 • Vitaliy Chiley, Vithursan Thangarasa, Abhay Gupta, Anshul Samar, Joel Hestness, Dennis Decoste
However, training them requires substantial accelerator memory for saving large, multi-resolution activations.
Ranked #313 on Image Classification on ImageNet (using extra training data)
no code implementations • 17 Mar 2022 • Ehsan Valavi, Joel Hestness, Newsha Ardalani, Marco Iansiti
In addition, we argue that increasing the stock of data by including older datasets may, in fact, damage the model's accuracy.
no code implementations • 17 Mar 2022 • Ehsan Valavi, Joel Hestness, Marco Iansiti, Newsha Ardalani, Feng Zhu, Karim R. Lakhani
Relating the text topics to various business areas of interest, we argue that competing in a business area in which data value decays rapidly alters strategies to acquire competitive advantage.
1 code implementation • 6 Jan 2022 • Yuanpeng Li, Joel Hestness, Mohamed Elhoseiny, Liang Zhao, Kenneth Church
This paper proposes an efficient approach to learning disentangled representations with causal mechanisms based on the difference of conditional probabilities in original and new distributions.
no code implementations • 19 Apr 2021 • Mihir Pendse, Vithursan Thangarasa, Vitaliy Chiley, Ryan Holmdahl, Joel Hestness, Dennis Decoste
The inverted residual bottleneck block uses lightweight depthwise separable convolutions to reduce computation by decomposing convolutions into a pointwise convolution and a depthwise convolution.
no code implementations • 1 Jan 2021 • Yuanpeng Li, Liang Zhao, Joel Hestness, Kenneth Church, Mohamed Elhoseiny
In this paper, we argue that gradient descent is one of the reasons that make compositionality learning hard during neural network optimization.
no code implementations • 1 Jan 2021 • Yuanpeng Li, Liang Zhao, Joel Hestness, Ka Yee Lun, Kenneth Church, Mohamed Elhoseiny
To our best knowledge, this is the first work to focus on the transferability of compositionality, and it is orthogonal to existing efforts of learning compositional representations in training distribution.
no code implementations • 25 Mar 2020 • Atli Kosson, Vitaliy Chiley, Abhinav Venigalla, Joel Hestness, Urs Köster
New hardware can substantially increase the speed and efficiency of deep neural network training.
1 code implementation • IJCNLP 2019 • Yuanpeng Li, Liang Zhao, Jian-Yu Wang, Joel Hestness
Compositional generalization is a basic mechanism in human language learning, but current neural networks lack such ability.
1 code implementation • 3 Sep 2019 • Joel Hestness, Newsha Ardalani, Greg Diamos
However, recent prior work shows that as dataset sizes grow, DL model accuracy and model size grow predictably.
no code implementations • ICLR 2019 • Newsha Ardalani, Joel Hestness, Gregory Diamos
A long-held conventional wisdom states that larger models train more slowly when using gradient descent.
no code implementations • 27 Sep 2018 • Joel Hestness, Sharan Narang, Newsha Ardalani, Heewoo Jun, Hassan Kianinejad, Md. Mostofa Ali Patwary, Yang Yang, Yanqi Zhou, Gregory Diamos, Kenneth Church
As the pace of deep learning innovation accelerates, it becomes increasingly important to organize the space of problems by relative difficultly.
no code implementations • 1 Dec 2017 • Joel Hestness, Sharan Narang, Newsha Ardalani, Gregory Diamos, Heewoo Jun, Hassan Kianinejad, Md. Mostofa Ali Patwary, Yang Yang, Yanqi Zhou
As DL application domains grow, we would like a deeper understanding of the relationships between training set size, computational scale, and model accuracy improvements to advance the state-of-the-art.
no code implementations • 15 Mar 2017 • Sercan O. Arik, Markus Kliegl, Rewon Child, Joel Hestness, Andrew Gibiansky, Chris Fougner, Ryan Prenger, Adam Coates
Keyword spotting (KWS) constitutes a major component of human-technology interfaces.