Search Results

DenseFormer: Enhancing Information Flow in Transformers via Depth Weighted Averaging

zhaoolee/garss 4 Feb 2024

The transformer architecture by Vaswani et al. (2017) is now ubiquitous across application domains, from natural language processing to speech processing and image understanding.

The Elements of Differentiable Programming

zhaoolee/garss 21 Mar 2024

Artificial intelligence has recently experienced remarkable advances, fueled by large models, vast datasets, accelerated hardware, and, last but not least, the transformative power of differentiable programming.

FABLES: Evaluating faithfulness and content selection in book-length summarization

zhaoolee/garss 1 Apr 2024

While LLM-based auto-raters have proven reliable for factuality and coherence in other settings, we implement several LLM raters of faithfulness and find that none correlates strongly with human annotations, especially with regard to detecting unfaithful claims.

Long-Context Understanding

Rendering string diagrams recursively

zhaoolee/garss 3 Apr 2024

The algebraic representation can be seen as a term of a free monoidal category or a proof tree for a small fragment of linear logic.

Category Theory Computational Geometry

Mixture-of-Depths: Dynamically allocating compute in transformer-based language models

zhaoolee/garss 2 Apr 2024

Our method enforces a total compute budget by capping the number of tokens ($k$) that can participate in the self-attention and MLP computations at a given layer.

Cannot find the paper you are looking for? You can Submit a new open access paper.