MAML, or Model-Agnostic Meta-Learning, is a model and task-agnostic algorithm for meta-learning that trains a model’s parameters such that a small number of gradient updates will lead to fast learning on a new task.
Consider a model represented by a parametrized function $f_{\theta}$ with parameters $\theta$. When adapting to a new task $\mathcal{T}_{i}$, the model’s parameters $\theta$ become $\theta'_{i}$. With MAML, the updated parameter vector $\theta'_{i}$ is computed using one or more gradient descent updates on task $\mathcal{T}_{i}$. For example, when using one gradient update,
$$ \theta'_{i} = \theta - \alpha\nabla_{\theta}\mathcal{L}_{\mathcal{T}_{i}}\left(f_{\theta}\right) $$
The step size $\alpha$ may be fixed as a hyperparameter or metalearned. The model parameters are trained by optimizing for the performance of $f_{\theta'_{i}}$ with respect to $\theta$ across tasks sampled from $p\left(\mathcal{T}_{i}\right)$. More concretely the meta-objective is as follows:
$$ \min_{\theta} \sum_{\mathcal{T}_{i} \sim p\left(\mathcal{T}\right)} \mathcal{L}_{\mathcal{T_{i}}}\left(f_{\theta'_{i}}\right) = \sum_{\mathcal{T}_{i} \sim p\left(\mathcal{T}\right)} \mathcal{L}_{\mathcal{T_{i}}}\left(f_{\theta - \alpha\nabla_{\theta}\mathcal{L}_{\mathcal{T}_{i}}\left(f_{\theta}\right)}\right) $$
Note that the meta-optimization is performed over the model parameters $\theta$, whereas the objective is computed using the updated model parameters $\theta'$. In effect MAML aims to optimize the model parameters such that one or a small number of gradient steps on a new task will produce maximally effective behavior on that task. The meta-optimization across tasks is performed via stochastic gradient descent (SGD), such that the model parameters $\theta$ are updated as follows:
$$ \theta \leftarrow \theta - \beta\nabla_{\theta} \sum_{\mathcal{T}_{i} \sim p\left(\mathcal{T}\right)} \mathcal{L}_{\mathcal{T_{i}}}\left(f_{\theta'_{i}}\right)$$
where $\beta$ is the meta step size.
Source: Model-Agnostic Meta-Learning for Fast Adaptation of Deep NetworksPaper | Code | Results | Date | Stars |
---|
Task | Papers | Share |
---|---|---|
Meta-Learning | 190 | 33.93% |
Few-Shot Learning | 69 | 12.32% |
Image Classification | 20 | 3.57% |
Reinforcement Learning (RL) | 19 | 3.39% |
Few-Shot Image Classification | 16 | 2.86% |
General Classification | 16 | 2.86% |
Classification | 12 | 2.14% |
Federated Learning | 10 | 1.79% |
Domain Adaptation | 6 | 1.07% |
Component | Type |
|
---|---|---|
🤖 No Components Found | You can add them if they exist; e.g. Mask R-CNN uses RoIAlign |