Search Results for author: Tokio Kajitsuka

Found 1 papers, 0 papers with code

Are Transformers with One Layer Self-Attention Using Low-Rank Weight Matrices Universal Approximators?

no code implementations26 Jul 2023 Tokio Kajitsuka, Issei Sato

Existing analyses of the expressive capacity of Transformer models have required excessively deep layers for data memorization, leading to a discrepancy with the Transformers actually used in practice.

Memorization

Cannot find the paper you are looking for? You can Submit a new open access paper.