Search Results for author: Tokio Kajitsuka

Found 1 papers, 0 papers with code

Are Transformers with One Layer Self-Attention Using Low-Rank Weight Matrices Universal Approximators?

no code implementations • 26 Jul 2023 • Tokio Kajitsuka, Issei Sato

Existing analyses of the expressive capacity of Transformer models have required excessively deep layers for data memorization, leading to a discrepancy with the Transformers actually used in practice.

Memorization

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.