Pathways Language Model

Introduced by Chowdhery et al. in PaLM: Scaling Language Modeling with Pathways

PaLM (Pathways Language Model) uses a standard Transformer model architecture (Vaswani et al., 2017) in a decoder-only setup (i.e., each timestep can only attend to itself and past timesteps), with several modifications. PaLM is trained as a 540 billion parameter, densely activated, autoregressive Transformer on 780 billion tokens. PaLM leverages Pathways (Barham et al., 2022), which enables highly efficient training of very large neural networks across thousands of accelerator chips.

Image credit: PaLM: Scaling Language Modeling with Pathways

Source: PaLM: Scaling Language Modeling with Pathways

Read Paper See Code

Papers

Paper	Code	Results	Date	Stars

Tasks

Task	Papers	Share
Language Modelling	22	9.40%
Question Answering	15	6.41%
In-Context Learning	9	3.85%
Large Language Model	8	3.42%
GSM8K	6	2.56%
Multi-task Language Understanding	6	2.56%
Few-Shot Learning	6	2.56%
Common Sense Reasoning	6	2.56%
Retrieval	5	2.14%

Usage Over Time

This feature is experimental; we are continuously improving our matching algorithm.

Components

Component	Type	Add Remove
🤖 No Components Found	You can add them if they exist; e.g. Mask R-CNN uses RoIAlign

Categories

Add Remove

Transformers