Regularization

Attention Dropout

Attention Dropout is a type of dropout used in attention-based architectures, where elements are randomly dropped out of the softmax in the attention equation. For example, for scaled-dot product attention, we would drop elements from the first term:

$$ {\text{Attention}}(Q, K, V) = \text{softmax}\left(\frac{QK^{T}}{\sqrt{d_k}}\right)V $$

Papers


Paper Code Results Date Stars

Tasks


Task Papers Share
RAG 279 18.89%
Retrieval-augmented Generation 234 15.84%
Retrieval 197 13.34%
Question Answering 68 4.60%
Language Modeling 40 2.71%
Language Modelling 40 2.71%
Large Language Model 36 2.44%
Information Retrieval 26 1.76%
Benchmarking 19 1.29%

Components


Component Type
Dropout
Regularization

Categories