Search Results for author: Srikant Bharadwaj

Found 2 papers, 0 papers with code

Lean Attention: Hardware-Aware Scalable Attention Mechanism for the Decode-Phase of Transformers

no code implementations17 May 2024 Rya Sanovar, Srikant Bharadwaj, Renee St. Amant, Victor Rühle, Saravan Rajmohan

We identify that the associative property of online softmax can be treated as a reduction operation thus allowing us to parallelize the attention computation over these large context lengths.

Cannot find the paper you are looking for? You can Submit a new open access paper.