Attention Dropout is a type of dropout used in attention-based architectures, where elements are randomly dropped out of the softmax in the attention equation. For example, for scaled-dot product attention, we would drop elements from the first term:
$$ {\text{Attention}}(Q, K, V) = \text{softmax}\left(\frac{QK^{T}}{\sqrt{d_k}}\right)V $$
Paper | Code | Results | Date | Stars |
---|
Task | Papers | Share |
---|---|---|
RAG | 279 | 18.89% |
Retrieval-augmented Generation | 234 | 15.84% |
Retrieval | 197 | 13.34% |
Question Answering | 68 | 4.60% |
Language Modeling | 40 | 2.71% |
Language Modelling | 40 | 2.71% |
Large Language Model | 36 | 2.44% |
Information Retrieval | 26 | 1.76% |
Benchmarking | 19 | 1.29% |